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Prahlad  T,  Ram 

Functional  proteomic  analysis  of  signaling  networks  and  response  to  targeted  therapy 
DOD-IDEA  BC044268 

Final  Report 

Introduction 

The  purpose  of  the  research  done  has  been  to  determine  the  regulation  of  the  EGFR  network  and  identify  how 
manipulations  of  the  network  alter  signal  flow  to  bypass  targeted  inhibitions.  The  scope  of  the  project  is  to 
understand  the  network  and  determine  which  molecules  have  to  be  targeted  to  inhibit  tumor  cell  proliferation. 

Body 

The  aims  of  the  proposal  were  as  follows. 

1 .  Determine  the  dynamics  of  the  EGFR/MAPK/Stat3/PI3K  signaling  network  in  response  to  EGE 
and  the  drugs  Iressa  and  Herceptin  in  breast  cancer  cells. 

2.  Determine  how  signals  are  integrated  and  routed  within  this  EGE  response  network  and  identify 
important  nodes  that  regulate  the  network. 

3.  Determine  what  combinations  of  targeted  inhibitors  effectively  reduce  proliferation  for  each  of 
the  breast  cancer  cell  lines. 


Since  the  final  report  guidelines  state  that  “Journal  publications  can  be  substituted  for  detailed  descriptions  of 
specific  aspects  of  the  research”  we  have  included  5  of  our  papers  that  describe  results  in  detail. 

Aim  1  of  the  application  was  the  development  of  quantitative  reverse  phase  protein  micro  arrays  to  determine 
the  changes  in  the  signaling  network.  We  have  accomplished  this  task. 


Eigure  1  and  2  below  shows  development  of  the  quantitative  array  to  measure  changes  in  phospho-protien 


Phospho-TSC2  Phospho-STAT3  Phospho-Bad  Phospho-RB 


(T1462)  (Y705)  (S155)  (S807/S811) 

123456  123456  123456  123456 


Phospho-p70S6K  Phospho-AKT  Phospho-MAPK1.2  Phospho-Src 
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Figure  1.  Western  blot  of  MCF10A  cells.  MCF10A 
breast  cells  were  treated  with  Lapatinib,  Dasatnib  or 
DMSO  as  control  followed  by  stimulation  with  EGF.  1. 
Control,  2.  EGF,  3.  Lapatinib,  4.  Lapatinib  +  EGF,  5. 
Dasatnib,  6.  Dasatnib  +  EGF.  Cell  lysates  were 
aliquoted  and  used  on  the  RPPA  and  for  Western  blot 
analysis. 
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Figure  2  RPPA  slides  and  statistical  data  analysis 
of  each  slide.  An  aliquot  of  the  MCF1 OA  lysate  was 
spotted  on  RPPA  slides  and  each  slide  was  probed 
with  the  antibody  listed.  The  same  antibodies  used  for 
the  Western  blot  in  Figure  3  was  used  for  the  RPPA. 
The  graphs  show  the  supercurve  analysis  data  fit,  the 
R^  vlalues  are  between  0.82  and  0.96.. 
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Figure  3.  Quantification  of  changes  in  activity  of  signaling  molecules  using  RPPA.  Purified  protein,  EGF 
treated  and  non-stimulated  MDA-468  cell  lysates  and  phospho-AKT  peptide  were  serial  diluted  and  spotted  on  a 
RPPA  slide.  The  slide  was  probed  using  phospho-S473  AKT  antibody.  The  slide  is  shown  along  with  the  range  of 
concentrations  of  the  samples  spotted  on  the  array. 
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Figure  4  RPPA  from  MCF10A  breast  cells.  MCF10A  cells  were 
incubated  with  2  different  pharmacological  inhibitors  for  4  hours  or 
with  DMSO  as  control.  Cells  were  then  stimulated  with  EGF  or 


vehicle  for  20  minutes.  The  cell  lysates  were  spotted  on  the  RPPA 
and  probed  using  30  different  phospho  and  total  antibodies.  The 
data  is  mean  centered  for  each  antibody:  black  -  mean  intensity 
value,  red  -  increased  intensity  of  signal  and  green  -  decreased 
intensity  of  signal.  Lap-  Lapatinib,  Das-  Dasatinib,  C-control,  A&B 
1 ,2,3  -  experiment  replicate  numbers  (n=6). 


levels  in  cells  upon  treatment  with  EGF. 


Figure  3  shows  the  quantitative  aspect  of  the 
RPPA  whereby  we  are  bale  to  detect  pico 
gram  amounts  of  protein. 

Figure  4.  shows  the  RPPA  data  from  the 
same  lysates  used  in  Fig  1  and  2.  As  can  be 
see  n  in  the  figure  4  we  are  able  to 
simultaneously  measure  protein  levels  of  a 
large  number  of  conditions. 

The  data  in  figure  5  shows  the  quantification 
of  changes  in  the  dynamic  activity  of 
signaling  molecules  upon  stimulation  with 
EGF.  We  have  extensively  investigated  the 
dynamic  changes  in  signaling  of  several 
members  of  the  signaling  network,  including 
MAPK,  AKT,  stats,  Src,  S6K,  p38,  NFkB, 
and  JNK  (please  see  attached  publications). 
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Figure  5  Simultaneous  measurements  of  activity  states  of  molecules  within  the  EGF  signaling  network.  Human 
breast  cancer  cells  (MDA231  and  BT549)  were  stimulated  with  20  ng/ml  EGF  for  the  times  indicated.  The  cells  were  lysed, 
quantified,  and  equal  concentrations  of  lysates  and  their  serial  dilutions  spotted  on  the  array.  The  slides  were  then  probed 
with  phospho-antibodies  to  y705Stat3,  y416Src,  and  s473AKT  as  well  as  antibodies  to  total  Stat3,  Src,  and  AKT.  The  slides 

were  scanned  and  spot  intensity  quantifier 

.  The  change  in  activity  was  measured  for  only  those  concentrations  that  were 

within  the  linear  range.  The  activity  was  normalized  to  total  protein  levels  and  the  data  shown  on  the  graph  above. 

I B  PathwayOracle 


Network  Paths 


Aim2  of  the  grant  was  to  integrate  the  biologieal 
experimental  data  into  a  computational  model  to 
determine  dynamic  properties  of  the  nertwork. 


Figure  6  shows  the  computational  analysis  of 
information  flow  in  the  EGFR  network  from 
experimental  data  measured  by  RPPA.  Please  see 
attached  publications  Ruths  et  al  2008  and 
ladevaia  et  al  2010) 


Figure  6  PathwayOracle  model  of  Lapatinib  signaling  in 
MCF10A  cells.  Red  indicates  increase  in  signal  compared  to 
control  non  treated  cells,  and  green  indicates  a  decrease  in 
signal,  black  are  no  changes  or  non-measured  nodes. 
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Figure  7.  RPPA  analysis  of  BT549  and  MDA231  cells.  The  two 

cell  lines  were  treated  with  MEK  (PD),  EGFR  (IR),  AKT  (Pf) 
inhibitors,  or  control  (C)  and  stimulated  with  EGF  for  30  minutes 
(EGF),  or  non-EGF  stimulated,  a&b  are  two  independent  samples. 
Black  is  mean  intensity,  red  increase,  and  green  decrease  in 
intensity. 


We  perturbed  individual  nodes  in  the  network 
to  determine  how  signaling  is  altered  through 
the  network.  Figure  7  shows  the  RPPA  data 
from  one  such  experiment  where  three 
different  drugs  were  used  in  two  cell  lines 
and  the  cells  were  stimulated  with  EGF. 
Similar  experiments  were  done  for  the  entire 
panel  of  cell  lines. 

Analysis  of  the  data  revealed  several 
feedback  loops  that  became  activated  when 
one  node  was  inhibited.  One  such  loop  was 
the  increase  in  pAKT  when  MEK  whas 
inhibited.  This  is  seen  in  the  RPPA  as  well  as 
by  western  blot  analysis  (Figure  8&9). 

Details  of  these  can  be  found  in  the  attached 
publications. 
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Figure  8.  PD98059  increases  AKT  (Western  blot 
aniaysis).  Aliquots  of  the  lysates  were  probed  for 
phospho  AKT  by  Western  blot  (top  panel). 
Quantification  of  the  data  shows  about  a  1 .8  fold 
increase  in  phospho  AKT. 
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Figure  9.  U0126  (MEK  inhibitor)  increase  AKT 
phosphorylation.  MDA231  cells  were  treated 
with  U0126  and  stimulated  with  EGF.  Western  blot 
analysis  shows  an  increase  in  AKT 
phosphorylation.  MARK  shows  inhibition  in 
resDonse  to  the  MEK  inhibitor. 
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Aim  3  of  the  grant  was  to  use  data  and  the  model  to  determine  optimal  eombinations  for  eaeh  eell  line. 


Based  on  the  experimental  data  (Figure  7-9)  and  our  developed  model  (Figure  10)  we  predicted  different 
combinations  of  targets  (Figure  1 1).  We  experimentally  tested  (Figures  12  &  13)  the  predictions  and  showed 

that  infact  combination  targeting  can  overcome 
deficiencies  of  single  targeted  agents.  These  and 
additional  data  are  seen  in  the  publications  Ruths  et 
al  2008  and  ladevaia  et  al  2010. 


EGF  +  +  +  +  +  + 

MEK  inhibitor  +  +  +  +  + 

TSC2  inhibitor  + 

Pi3K  inhibitor  + 

mTOR  inhibitor  + 

EGFR  inhibitor  + 

Figure  11.  PathwayOracle  modeling  of  different 
inhibitors  in  combination.  Modeling  of  combinations  of 
targeted  inhibition  on  AKT  was  simulated  in 
PathwayOracle.  The  model  predicts  that  a  combination 
of  MEK  and  mTOR  inhibition  will  increase  AKT,  while 
combinations  of  AKT,  TSC2,  and  EGFR  inhibitions  with 
MEK  will  not.  These  predictions  will  be  experimentally 
tested. 


Figure  10  Subnetwork  showing  the  connectivity  of  the 
EGFR/MAPK/AKT  subnetwork.  The  colors  indicate 
changes  in  phosphorylation  in  response  to  EGF  compared 
to  non  stimulated  control  cells.  Red  is  increase  and  green 
is  decrease  in  phosphorylation. 


RPPA  results 


PD98059  +  + 

Lapatinib 

Figure  12.  Lapatinib  inhibits  the 
MEK  increase  in  AKT 
phosphorylation.  MDA231  cells  were 
treated  with  inhibitors  to  MEK,  EGFR 
and  in  combination.  The  combination 
inhibits  the  AKT  increase  in 
phosphorylation . 
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Figure  13.  siRNA  knockdown  of  TSC2  and 
AKT  response  to  EGF  and  MEK  inhibition. 

TSC2  was  knocked  down  in  MDA231  cells  with 
TSC2  siRNA,  control  cells  were  transfected  with 
non  specific  siRNA.  The  cells  were  treated  with 
MEK  inhibitor  or  vehicle  for  2  h  and  stimulated 
with  EGF.  The  lysates  were  probed  for  phospho 
AKT,  total  TSC2,  and  actin.  Knockdown  of  TSC2 
blocks  the  MEK  inhibition  induced  increase  in 
AKT  phosphorylation. 


7 


Key  research  accomplishments 

Aim  1.  We  developed  a  quantitative  RPPA  to  measure  dynamies  of  signaling  network  in  response  to  EGF 
Aim2.  We  developed  a  eomputational  model  and  integrated  the  experimental  data. 

Aim  3.  We  predieted  and  tested  eombinations  of  targets  based  on  underlying  experimental  data. 


Reportable  outcomes 

The  grant  has  help  support  researeh  and  personnel  that  has  resulted  in  5  publieations  (see  referenees).  The  data 
from  the  grant  has  also  allowed  me  to  obtain  a  NIH-ROl  grant  (ROl  CA125109).  The  results  from  the  work  were 
also  given  during  a  talk  and  poster  presentation  at  the  2008  ERA  of  HOPE  meeting  in  Baltimore,  MD. 


Conclusions 

We  have  sueeessfully  eompleted  the  objeetives  of  the  grant.  We  have  learnt  that  targeted  manipulation  of  the 
signaling  network  ean  lead  to  unforeseen  ehanges  elsewhere  in  the  network,  therefore  we  need  to  understand 
what  these  other  changes  are  and  determine  optimal  combinations  to  kill  breast  cancer  cells. 
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Abstract.  Biological  signaling  networks  comprise  the  chemical  processes  by 
which  cells  detect  and  respond  to  changes  in  their  environment.  Such  networks 
have  been  implicated  in  the  regulation  of  important  cellular  activities  including 
cellular  reproduction,  mobility,  and  death.  Though  technological  and  scientific 
advances  have  facilitated  the  rapid  accumulation  of  information  about  signaling 
networks,  utilizing  these  massive  information  resources  has  become  infeasible 
except  through  computational  methods  and  computer-based  tools.  To  date,  visu¬ 
alization  and  simulation  tools  have  received  significant  emphasis.  In  this  paper, 
we  present  a  graph-theoretic  formalization  of  biological  signaling  network  mod¬ 
els  that  are  in  wide  but  informal  use,  and  formulate  two  problems  on  the  graph:  the 
Constrained  Downstream  and  Minimum  Knockout  Problems.  Solutions  to  these 
problems  yield  qualitative  tools  for  generating  hypotheses  about  the  networks, 
which  can  then  be  experimentally  tested  in  a  laboratory  setting.  Using  estab¬ 
lished  graph  algorithms,  we  provide  a  solution  to  the  Constrained  Downstream 
Problem.  We  also  show  that  the  Minimum  Knockout  Problem  is  NP-Hard,  pro¬ 
pose  a  heuristic,  and  assess  its  performance.  In  tests  on  the  Epidermal  Growth 
Factor  Receptor  (EGER)  network,  we  find  that  our  heuristic  reports  the  correct 
solution  to  the  problem  in  seconds.  Source  code  for  the  implementations  of  both 
solutions  is  available  from  the  authors  upon  request. 
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1  Introduction 

In  this  paper,  we  use  the  term  biological  networks  to  refer  to  cell  signaling  networks, 
chains  of  reactions  involved  in  triggering,  propagating,  and  processing  signals  within 
the  cell.  These  networks  regulate  many  cellular  activities  that  are  critical  to  the  health  of 
the  cell  and  the  larger  systems  to  which  it  may  belong.  Altered  biological  networks  have 
been  implicated  as  the  cause  of  many  devastating  diseases  including  cancer  [15],  heart 
disease  [11],  congenital  abnormalities  [9],  metabolic  disorders  [15],  and  immunological 
abnormalities  [15]. 

Significant  research  efforts  to  identify  and  map  biological  networks,  aided  by  new 
technologies  and  scientific  methods,  have  amassed  vast  databases  of  molecules  and  pu¬ 
tative  interactions  among  them.  Given  the  immense  scale  of  networks  now  in  common 
use,  computational  techniques  to  filter,  search,  and  reason  about  them  have  become 
indispensable. 

Existing  research  on  computational  tools  in  this  area  has  focused  on  two  forms  of 
analysis:  visualization  of  the  networks  [17,  13,  22,  10],  and  detailed  simulations  of 
small  subnetworks  based  on  initial  conditions,  reaction  rates,  and  other  molecule  and 
reaction-specific  parameters  [19,  12,  24,  21].  Because  of  the  difficulty  of  determin¬ 
ing  these  parameters,  higher-level  models  are  used  whenever  possible.  Recent  efforts 
have  also  developed  hypothesis  generation  techniques  that  use  model  checking  and  for¬ 
mal  verification  in  order  to  qualitatively  reason  about  networks  [6,  5,  26,  8,  7,  27]. 
Hypothesis  testing  tools  establish  the  set  of  most-likely  outcomes  of  an  experiment, 
providing  insights  into  experimental  design,  thereby  reducing  the  investment  of  time 
and  labor-intensive  laboratory  work.  Existing  hypothesis  generation  tools  require  state¬ 
ments  about  the  properties  of  individual  reactions  in  networks,  details  that  are  often 
unavailable  for  many  networks.  In  this  paper,  we  present  a  framework  for  computa¬ 
tional  hypothesis  testing  that  only  depends  on  the  simplest  property  of  a  reaction  -  its 
reactants  and  products.  Our  framework  combines  currently-used  graph-based  network 
representations  with  graph  algorithms.  We  also  formalize  two  biologically  significant 
problems  useful  for  hypothesis  testing. 

The  Constrained  Downstream  Problem  seeks  the  set  of  reactions  in  a  biological  net¬ 
work  that  leads  from  one  set  of  molecules  to  another,  such  that  the  set  is  constrained  to 
include  reactions  from  a  given  set  and  exclude  reactions  from  another  given  set.  This  is  a 
useful  tool  in  the  design  of  drugs  to  modify  or  inhibit  certain  biological  functions  while 
preserving  others.  At  the  signaling  network  level  this  would  help  to  identify  molecules 
or  sets  of  molecules  that  have  to  be  targeted  to  inhibit  function  of  a  sub-network  while 
preserving  signal  flow  to  a  different  sub-network.  A  biological  endpoint  for  this  type  of 
problem  would  be  if  one  wanted  to  identify  a  molecule  (or  a  set  of  molecules)  to  inhibit 
proliferation  while  at  the  same  time  preserving  metabolic  or  secretary  functions.  We 
provide  a  polynomial-time  algorithm  for  solving  this  problem. 

The  Minimum  Knockout  Problem  seeks  a  minimum-size  set  of  molecules  whose 
removal  (or  knocking  out)  from  the  biological  network  makes  the  production  of  a  set  of 
molecules  impossible  given  an  initial  set  of  molecules.  The  minimum  knockout  problem 
is  very  important  in  the  identification  of  molecular  targets  for  therapies,  especially  in 
cancer.  Traditional  chemotherapeutics  function  by  killing  rapidly  dividing  cells,  the  end 
result  being  both  cancer  cells  as  well  as  normal  cells  are  killed,  hence  the  hair  loss 
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and  gastro-intestinal  side  effects  of  these  drugs.  In  the  past  few  years  there  has  been 
a  great  effort  in  developing  drugs  that  specifically  target  signaling  molecules  that  are 
aberrantly  functioning  in  cancer  cells.  The  clinical  trials  and  data  from  these  drugs 
show  that  they  are  limited  in  their  ability,  and  function  best  in  combination  with  other 
targeted  drugs.  Therefore,  the  biological  problem  here  is  to  identify  the  optimal  and 
minimal  sets  of  molecules  that  have  to  be  targeted  to  block  network  function.  This  will 
allow  the  development  of  therapeutics  that  can  efficiently  kill  cancer  cells  while  still 
preserving  normal  cells. 

The  rest  of  the  paper  is  organized  as  follows.  In  Section  2  we  introduce  a  graph 
formalization  of  biological  networks.  In  Section  3  we  formulate  the  Constrained  Down¬ 
stream  problem,  and  present  a  polynomial-time  solution  of  it.  In  Section  4  we  formu¬ 
late  the  Minimum  Knockout  problem,  prove  its  NP-hardness,  and  devise  a  randomized 
heuristic  for  solving  it.  In  Section  5  we  analyze  the  accuracy  and  performance  of  the 
proposed  heuristic  for  the  Minimum  Knockout  Problem  on  a  large  biological  network. 
In  Section  6  we  conclude  and  outline  future  research  directions. 

2  Biological  Networks 

Standard  models  of  biological  networks  encompass  various  molecules  and  interactions 
among  them  that  occur  on  and  within  the  cell  membrane.  An  example  of  such  a  model  is 
given  in  Fig.  1.  These  models  consist  of  instances  of  two  fundamental  components:  (1) 
a  molecule,  either  inorganic  (such  as  oxygen,  O2),  or  organic  such  as  proteins,  segments 
of  DNA,  RNA,  or  even  complexes  consisting  of  one  or  more  molecules  attached  to  one 
another;  and  (2)  an  interaction,  which  is  a  change  that  occurs  to  one  or  more  molecules. 
A  change  to  a  molecule  will  either  change  a  property  of  the  molecule  (activity  and/or 
localization),  bind  one  or  more  molecules  together,  or  break  one  or  more  molecules 
apart. 

A  salient  feature  of  biological  networks  is  the  common  occurrence  of  feedback  loops 
(Fig.  1)  in  which  a  molecule  a,  through  a  series  of  interactions,  gives  rise  to  another 
molecule  b  that  directly  changes  properties  of  molecule  a  through  interactions.  A  neg¬ 
ative  feedback  loop  is  a  chain  of  interactions  which  decreases  the  activity  of  molecule 
a  in  the  network;  a  positive  feedback  loop  increases  the  activity  of  molecule  a  in  the 
network. 

2.1  Existing  Representations  and  Models  of  Biological  Networks 

Standard  models  of  biological  networks  have  served  as  the  basis  for  a  number  of  com¬ 
putational  models  allowing  simulating  and  reasoning  about  cell  processes.  We  briefly 
review  some  of  these  models  in  this  section. 

The  Systems  Biology  Markup  Language  (SBML)  is  an  XML  schema  for  represent¬ 
ing  biological  networks  in  addition  to  regulatory  and  metabolic  networks  [1].  The  model 
uses  chemical  nomenclature:  molecules  are  called  species  and  interactions  are  called  re¬ 
actions.  In  the  model,  a  reaction  has  reactant,  product,  and  modifier  species.  Reactants 
and  modifiers  are  both  inputs  to  the  reaction,  differing  only  in  that  a  modifier  affects 
the  reaction  without  undergoing  any  changes.  The  SBML  file  format  and  underlying 
network  structure  is  well-accepted  and  supported  by  a  large  number  of  tools  [1]. 

Petri  Nets  are  considered  hybrid  models  of  networks  because  they  model  both  the 
global  topology  of  the  network  as  well  as  some  of  the  reaction-specific  parameters  that 
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determine  the  quantitative  behavior  of  the  system.  They  have  been  used  with  varying 
success  to  simulate  biological  networks  [28].  In  translating  the  biological  network  into 
a  Petri  Net,  each  molecule  and  each  interaction  is  given  its  own  node.  Each  molecule  is 
initialized  with  some  number  of  ‘tokens’  which  are  then  iteratively  reallocated  by  the 
interactions  to  which  they  are  connected.  For  a  more  detailed  discussion  of  PetriNets, 
see  [23];  for  an  example  application  see  [28]. 

Differential  and  algebraic  models  attempt  to  simulate  the  quantitative  characteristics 
of  a  network  using  mathematical  formulae  as  well  as  constants  and  parameters  that  have 
been  determined  for  each  reaction  in  the  network  [3,  4,  14,  25,  19,  12,  24,  21].  While 
undisputedly  the  most  accurate  of  available  techniques,  these  methods  are  not  yet  able 
to  simulate  large  networks  efficiently  and  accurately. 

Model  checking  and  formal  verification  techniques  use  logical  models  of  networks 
to  make  qualitative  assertions  about  their  temporal  properties  (i.e.,  whether  a  certain 
reaction  will  ever  take  place  under  certain  conditions).  These  tools  have  received  sig¬ 
nificant  attention  due  to  their  ability  to  support  rapid  qualitative  hypothesis  generation 
without  requiring  significant  information  specific  to  the  networks  of  interest  [6,  5,  26,  8, 
7,  27].  In  contrast  to  differential,  algebraic,  and  Petri  Net  models  which  require  numer¬ 
ical  parameters,  logical  models  require  qualitative  properties  of  individual  reactions. 
Further,  the  temporal  logic  mechanisms  that  these  approaches  use  are  limited  in  their 
expressive  power  vis-a-vis  general  querying  of  biological  networks. 

The  rapid  pace  of  lab-based  research  on  biological  networks  forces  biologists  to 
deal  with  large  numbers  of  reactions  and  molecules.  The  lack  of  much  quantitative  or 
qualitative  data  for  these  reactions  and  molecules,  coupled  with  the  complexity  of  ques¬ 
tions  that  biologists  would  ask  about  the  networks,  significantly  hinder  the  applicability 
and  appropriateness  of  the  techniques  described  above.  There  is  a  significant  need  for 
tools  that  provide  hypothesis  generation  capabilities  in  the  absence  of  detailed  network 
information.  For  many  known  networks,  the  only  experimentally  confirmed  detail  is  the 
existence  of  reactions  and  the  identities  of  their  reactants  and  products.  To  the  best  of 
our  knowledge,  hypothesis  generation  tools  that  operate  on  this  information  alone  are 
lacking.  In  this  paper,  we  propose  a  model  and  approach  for  such  tools  and  provide  a 
working  implementation  of  two  problems  useful  for  hypothesis  generation. 

2.2  A  Graph  Formulation  of  Biological  Networks 

Numerous  tools  exist  that  visualize  networks  as  graphs  in  which  molecules  and  inter¬ 
actions  appear  as  interconnected  nodes  [17,  13,  22,  10].  Here,  we  formalize  this  graph 
representation  in  order  to  provide  a  model  on  which  hypothesis  testing  problems  can 
be  posed,  analyzed  and  solved.  In  this  section  we  introduce  this  graph-theoretic  model, 
which  we  call  the  Pathway  Graph. 

Definition  1.  A  pathway  Graph  is  a  directed  graph,  G  =  {V° ,  ,  E),  with  two  types 
of  nodes:  molecule-nodes,  V°,  and  interaction-nodes,  V°,  with  the  following  proper¬ 
ties: 

1.  n  =  0  and 

2.  For  every  (u,  v)  G  E,  u  and  v  are  not  of  the  same  type. 

Property  (1)  in  Definition  1  implies  that  each  node  in  the  graph  is  either  a  molecule-node 
or  an  interaction-node.  Property  (2)  reflects  that  the  fact  that,  biologically,  a  molecule 
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(a)  (b) 


Fig.  1.  (a)  A  diagram  of  a  pathway  graph  model  of  a  biological  network.  Circles  represent 
molecules  and  rectangles  represent  interactions,  (b)  The  Remove  procedure  for  removing  a  node 
from  a  pathway  graph  and  propagating  its  effect. 

cannot  directly  produce  another  molecule  except  through  an  interaction,  nor  can  a  reac¬ 
tion  lead  to  another  reaction  except  through  a  molecule. 

The  effect  of  the  “removal”  of  molecule-nodes  from  the  pathway  graph  is  of  signif¬ 
icant  interest  to  researchers  because  it  models  the  effect  of  drugs  that  inhibit  sections  of 
the  network.  In  particular,  they  are  interested  in  the  connectivity  of  the  graph  resulting 
from  the  removal  of  those  nodes.  Biologically,  the  effect  of  removing  a  node  u  in  a 
pathway  graph  usually  propagates  further  to  other  nodes  reachable  from  v.  interactions 
involving  the  removed  molecule  can  no  longer  occur,  the  products  of  those  interactions 
are  no  longer  produced,  and  so  on.  Procedure  Remove  in  Fig.  1(b)  is  a  formal  descrip¬ 
tion  of  the  “propagation  effect”  of  the  removal  of  a  node  u  in  a  pathway  graph.  An 
additional  set  of  nodes,  Y,  is  also  specified  to  indicate  the  nodes  at  which  to  terminate 
the  propagation.  The  Remove  procedure  can  be  extended  to  apply  to  a  set  of  nodes  in  a 
straightforward  manner;  Remove(G{V° ,  ,E),X,  Y).  In  this  case.  Remove  is  applied 

successively  to  the  nodes  in  X  (this  application  yields  the  same  result  regardless  of  the 
order  of  nodes). 

3  The  Constrained  Downstream  Problem 

A  critical  piece  of  information  necessary  to  predict  the  outcome  of  biological  net¬ 
work  experiments  is  the  set  of  molecules  and  interactions  dependent  on  a  given  set 
of  molecules  and/or  interactions.  In  the  pathway  graph  model,  all  elements  in  the  graph 
that  are  dependent  on  (downstream  from)  a  set  of  molecules  and  interactions  are  reach¬ 
able  from  that  set. 

Because  of  feedback  loops  in  the  network,  often  the  set  of  downstream  nodes  will 
contain  a  significant  portion  of  the  network,  sometimes  the  entire  network.  In  order 
to  reduce  the  number  of  downstream  nodes  returned,  a  biologist  may  choose  to  apply 
certain  constraints  to  the  downstream  node  search.  Constraints  restrict  the  solution  to 
the  set  of  nodes  belonging  to  a  path  from  nodes  in  set  S  to  nodes  in  set  T  that  includes 
one  or  more  nodes  contained  in  set  I  and  not  containing  any  nodes  in  set  X.  These 
nodes  belong  to  a  subset  of  all  possible  downstream  nodes.  This  is  a  useful  tool  in 
the  design  of  drugs  to  modify  or  inhibit  certain  biological  functions  while  preserving 
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others.  At  the  signaling  network  level  this  would  help  to  identify  molecules  or  sets  of 
molecules  that  have  to  be  targeted  to  inhibit  function  of  a  sub-network  while  preserving 
signal  flow  to  a  different  sub-network. 

Problem  1.  The  CONSTRAINED  Downstream  Problem 

Input:  Pathway  graph  G  =  {V°,V°,E)  and  four  sets  S,T  C  V°  and  I,X  C 
{V°UV°). 

Output:  Subgraph  G'  =  {U°,  U°,E')  where 

1.  U°  C  l/°,  U°  C  l/°,  and  E'  C  E- 

2.  Vu  G  (C/°  U  U°),  3[s  G  S'  ,  f  G  T]  such  that  u  and  u  ^ 

3.  ([/°U[/°)nX  =  0; 

4.  every  path  from  a  node  in  S  to  a  node  in  T  passes  through  a  node  in  /;  and 

5.  G'  is  the  maximum  subgraph  that  satisfies  conditions  1^. 


G'  =  FindDownstream(G  ={V°,V°,  E),  S,T,I,X) 

1.  Vi  G  T,  Visited[t]  =  1;  Vi  ^  T,  Visited[t]  =  0; 

2.  Vi  G  T,  OnPath[t]  =  1;  Vi  ^  T,  OnPath[t]  =  0; 

3.  Vu  G  1V°  U  AboveInclude[v]  =  0; 

4.  G'  =  (0,0,0); 

5.  For  every  s  G  S 

C alcDownstream{G ,  s,  I,  0,  G'). 

6.  Return  G'; 

CalcDownstream(G,  v,  I,  P,  G') 

1.  \fVisited[v]  ==  0 

(a)  Visited[v\  =  1; 

(b)  Let  G  be  the  children  of  v; 

(c)  For  every  c  £  G 

GalcDownstream{G,c,  (pi, . . .  ,pk,v))', 

2.  Else 

(a)  If  OnPath[v]  ==  0 
Return; 

(b)  If  AboveInclude[v]  ==  1 

i.  Cg-  =  Vg'  U  P; 

ii.  Eg'  =  Eg'  U  {(pr,P2), .  •  • ,  (pfe-i,pfc)}; 
hi.  Vp  G  P,  AboveInclude[p\  =  1; 

(c)  Else  If  P  n  7  /  0 

i.  Cg'  =  Vg'  U  P; 

ii.  Eg'  =  Eg'  U  {(pr,P2), .  •  • ,  (pfe-i,pfc)}; 

iii.  Vp  G  P,  AboveInclude[p\  =  1; 

(d)  Else,  Return; 


Fig.  2.  The  algorithm  for  the  Constrained  Downstream  Problem. 

The  algorithm  in  Fig.  2  solves  the  Constrained  Downstream  Problem  with  time  com¬ 
plexity  0{\V°  U  V°  I).  Analysis  of  the  running  time  as  well  as  proof  of  the  correctness 
of  the  algorithms  are  omitted  due  to  space  constraints. 

We  write  a:  -S  p  to  denote  that  node  y  is  reachable  from  node  x  in  graph  G. 
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4  The  Minimum  Knockout  Problem 

A  problem  of  significant  interest  to  experimental  biologists  researching  networks  impli¬ 
cated  in  disease  is  the  minimum  knockout  problem.  In  this  problem,  for  a  given  pathway 
graph,  a  minimal  set  of  nodes  is  sought  such  that  the  removal  of  these  nodes  discon¬ 
nects  a  given  set  of  (source)  molecules,  S  C  V°,  from  another  given  set  of  (target) 
molecules,  T  C  V°.  The  minimum  knockout  problem  is  very  important  in  the  identifi¬ 
cation  of  molecular  targets  for  therapies,  especially  in  cancer.  Traditional  chemothera- 
peutics  function  by  killing  rapidly  dividing  cells,  the  end  result  being  both  cancer  cells 
as  well  as  normal  cells  are  killed,  hence  the  hair  loss  and  gastro-intestinal  side  effects 
of  these  drugs.  Therefore,  the  biological  problem  here  is  to  identify  the  optimal  and 
minimal  sets  of  molecules  that  have  to  be  targeted  to  block  network  function.  Formally, 
we  define  the  problem  (decision  version)  as  follows. 

Problem  2.  The  Minimum  Knockout  Problem  (MKO) 

Input:  Pathway  graph  G  =  {V° ,V°  ,E),  two  sets  of  nodes  S,T  C  V°,  and  a 
positive  integer  Q. 

Question:  Does  there  exist  a  set  C/  C  ((K°  U  V°)  —  (S'  U  T))  with  |[/|  <  Q  such 
that  Remove(G',C/,  S  U  T)  yields  graph  G'  =  {V'° ,  V'°  ,E')  in  which  for  every 
s  G  S  and  t  G  T,  s  tl 

We  first  prove  that  MKO  is  NP-Hard,  and  then  present  an  efficient  and  accurate  ran¬ 
domized  heuristic  for  solving  it.  We  prove  the  NP-hardess  of  the  problem  by  a  reduction 
from  the  Minimum  Set  Cover  Problem  [16]. 

Problem  3.  The  Minimum  Set  Cover  Problem  (MSC) 

Instance:  Collection  G  of  subsets  of  a  finite  set  B  and  a  positive  integer  K  <  \G\. 
Question:  Does  G  contain  a  cover  for  B  of  size  K  or  less,  i.e.,  a  subset  G'  C  C 
with  \G'\  <  K  such  that  every  element  of  B  belongs  to  at  least  one  member  of  C'l 

Theorem  I.  MKO  is  NP-Hard. 

Proof.  Given  an  instance  {B  =  {bi, . . . ,  &m})  C  =  {Ci, . . . ,  G„},  K)  of  MSC,  we 
construct  an  instance  (G,  S',  T,  Q)  of  MKO  as  follows. 

-  Pathway  graph  G  =  {V° ,  V°,E)  where 

u  v°  =  {si,  Ui  :  Gi  e  G}  U  {ti  :  bi  G  B}. 

•  =  {f^  :  G,  G  G}  U  {g,  :  h  G  B}. 

•  E  =  {{si,fi)  :  1  <  i  <  n}  U  {{fi,Ui)  :  1  <  i  <  n}  U  {{ui,gj)  :  bj  G 

C'J  U  {{gi,ti)  :  1  <  i  <  m}. 

-  S={s,  GC°}. 

-T={U€  C°}. 

-  Q  =  K. 

Fig.  3  gives  an  example  of  the  construction.  The  graph  G  constructed  by  the  reduction 
satisfies  the  conditions  of  Definition  1,  and  hence  it  is  a  pathway  graph.  We  now  estab¬ 
lish  the  validity  of  the  reduction  by  showing  that  (B,  G,  K)  is  a  yes-instance  of  MSC  if 
and  only  if  (G,  S,  T,  Q)  is  a  yes-instance  of  MKO. 
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Fig.  3.  The  G,  S,  and  T  components  of  the  MKO  instance  constructed  by  the  reduction 
in  the  proof  of  Theorem  1  for  the  MSC  instance  with  B  —  {bi,  62,  fea,  &4,  bs,  be},  C  = 

{{61,62},  {63},  {63, 64,  65},  {65,  be},  {be}},  and  7T  =  3;  Q  =  3. 

Let  C'  C  C  with  |  C'  |  <  iT  be  a  cover  for  B.  Then,  by  construction,  every  node  in  the 
set  Y  =  {gi  G  y°}  has  an  incoming  edge  from  a  node  in  the  set  X  =  {ui  :  Ci  G  C'}. 

Since  Y  contains  only  interaction  nodes,  applying  Remove  (Fig.  1(b))  to  all  nodes  in 
X  will  disconnect  all  paths  from  nodes  in  S  to  nodes  in  T.  Since  |X|  =  \C'\  <  K  and 
Q  —  K,  it  follows  that  (G,  S',  T,  Q)  is  a  yes-instance  of  MKO. 

<;=  Assume  there  does  not  exist  a  cover  of  size  K  or  less  for  B.  Then,  for  every  set 
C'  C  C  with  \  C'\  <  K,  there  is  at  least  one  b'  €  B  such  that  b'  ^  Ucgc'C-  By  construc¬ 
tion  of  G,  it  follows  that  for  any  subset  of  X  =  {ui  :  Ci  G  G}  of  size  Q  or  less,  there 
exists  at  least  one  node  in  F  =  {gi  G  V'^}  that  is  not  a  child  of  any  node  in  X.  Hence, 
removing  all  nodes  in  X  will  not  disconnect  all  paths  from  S  to  T.  Since  every  node 
in  T  has  a  unique  parent  in  Y,  it  follows  that  there  does  not  exist  a  set  of  nodes  of  size 
Q  or  less  that  disconnects  all  paths  from  nodes  in  S  to  nodes  in  T.  Hence,  (G,  S,  T,  Q) 
is  not  a  yes-instance  of  MKO.  This  finishes  the  proof,  thus  establishing  that  MKO  is 
NP-hard. 

4.1  An  Efficient  and  Accurate  Randomized  Heuristic  for  MKO 

We  now  give  an  efficient  and  accurate  heuristic  for  solving  MKO;  the  heuristic  is  an  it¬ 
erative  randomized  search,  with  running  time  0{nmk),  where  n  is  the  number  of  nodes 
in  the  input  pathway  graph,  m  is  the  number  of  nodes  in  the  constrained  downstream 
subgraph,  and  k  is  the  number  of  iterations.  In  the  worst-case  scenario,  m  =  n-,  how¬ 
ever,  in  our  experiments  on  a  large  pathway  graph,  we  found  that  k,  m  «  n.  The 
heuristic  is  outlined  in  Fig.  4,  and  makes  use  of  the  following  lemma. 

Lemma  1.  Let  U  be  a  minimum  knockout  set  for  S  and  T  in  graph  Gd  =  Downstream(G,  S,  T), 
where  Gd  =  {Vf  ^  Vd  >  ^d)-  Then,  there  exists  a  minimum  knockout  set  U'  such  that 

1.  |C/'|  =  \U\  and 

2.  UC  {Vf  U  {Children{S)  n  V°)). 

The  formal  proof  is  omitted  due  to  space  constraints.  Intuitively,  this  lemma  states 
that  if  node  v  G  V°  is  an  element  of  a  solution  to  MKO,  then  v  can  be  replaced  by 
some  v'  G  V°  where  (w',  v)  is  an  edge  in  the  graph.  The  validity  of  this  lemma  follows 
from  the  definition  of  the  Remove  procedure  (Fig.  1).  The  only  V°  nodes  that  cannot 
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MinKnockoutCG  =  {V°  ,V°  ,E),  S,T,m) 

U  =  FindDownstream{G,  S,T,0,0) 

2.  u°  =  unv° 

3.  [/□  = 

4.  C  =  {u  GU  :uGU°  W  u  G  {Children{S)  n  [7°)} 

5.  For  i  =  1  to  m 

(a)  G'  =  G 

(b)  Si  =  0 

(c)  While  SS^T 

i.  c  €  (G  -  S^) 

ii.  G'  =  Remove{G' ,  c,  S,  T) 

iii.  Si  =  SiU  {c} 

6.  j  —  argmaxi\Si\ 

7.  Return  Sj 


Fig.  4.  An  iterative  and  randomized  heuristic  for  MKO. 

be  replaced  in  this  manner  are  the  children  of  S  (since  elements  of  S  cannot  appear  in 
the  solution). 

The  intuition  for  the  heuristic  is  to  exhaustively  search  a  small  (relative  to  the  size 
of  the  actual  pathway  graph)  set  of  nodes  for  the  smallest  knockout  set  for  S  and  T.  By 
constructing  the  set  of  nodes  so  that  it  does  contain  a  knockout  set  (though  not  neces¬ 
sarily  a  globally  minimal  knockout  set),  the  algorithm  is  guaranteed  to  find  a  solution, 
though  it  may  not  be  minimal. 

Before  we  describe  our  heuristic,  we  review  background  material  that  will  be  used 
in  the  heuristic.  Given  a  directed  graph  G  =  {V,  E),  and  two  sets  S,T  C  V,  a  path  in 
G  is  an  S — T  path  if  it  runs  from  a  node  in  S'  to  a  node  in  T.  A  set  C  C  V  is  called 
S — T  disconnecting  if  G  intersects  each  S — T  path  (C  may  intersect  S  U  T). 

Theorem  2.  (Menger’s  Theorem  [20])  Let  G  =  {V,  E)  be  a  directed  graph  and  let 
S,T  C  V.  Then,  the  maximum  number  of  node-disjoint  S — T  paths  is  equal  to  the 
minimum  size  of  an  S — T  disconnecting  node  set. 

Now,  we  are  in  position  to  describe  our  heuristic.  We  construct  the  search  set,  C  (line 
4),  to  have  the  properties  of  containing  a  knockout  set  and  being  small  relative  to  the 
number  of  nodes  in  the  entire  pathway  graph  as  follows. 

1.  G  =  FindDownstream{G,  S,  T,  0,  0).  By  Menger’s  theorem,  a  knockout  set  is 
contained  in  the  set  of  nodes  that  comprise  all  paths  connecting  S  and  T  because 
one  such  knockout  set  is  a  node  from  each  disjoint  path  connecting  S  to  T.  Since 
the  Constrained  Downstream  Problem  constructs  this  set,  G  contains  a  knockout 
set.  In  addition,  the  constrained  set  of  downstream  nodes  for  most  choices  of  S  and 
T  will  contain  far  fewer  nodes  than  the  entire  pathway  graph. 

2.  G  =  {G nV°)U {GnChildren{S)).  Lemma  1  states  that,  except  for  the  nodes  with 
elements  of  S  as  inputs,  any  node  that  occurs  in  a  minimum  knockout  set  can 
be  replaced  by  a  single  V°  node.  Thus,  by  searching  only  the  V°  nodes  between  S 
and  T,  the  search  set  is  further  reduced  in  size  and  still  contains  a  knockout  set. 


Hypothesis  Generation  in  Biological  Networks 


9 


Fig.  5.  (a)  The  distribution  of  nodes  in  the  EGFR  pathway  graph  by  the  total  number  of  nodes  in 
the  pathway  graph  they  can  reach,  (b)  A  screen  shot  from  the  implementation  of  the  model  and 
algorithms  in  this  paper. 


After  constructing  the  search  set  C  (lines  1 — 4),  the  algorithm  performs  m  random¬ 
ized  searches  over  all  nodes  in  set  C  for  a  knockout  set.  Within  each  search  (loop  on 
line  5c),  a  knockout  set  is  iteratively  constructed  by  removing  a  randomly  selected  node 
(line  5(c)i)  from  the  graph  until  S  and  T  are  disconnected.  Of  the  m  knockout  sets 
constructed,  the  knockout  set  with  fewest  members  is  returned. 

While  the  heuristic  does  not  guarantee  an  upper-bound  on  error,  our  experiments 
show  that,  for  the  Epidermal  Growth  Factor  Receptor  (EGFR)  biological  network  [22], 
this  heuristic  finds  a  minimum-knockout  set  every  time.  We  discuss  the  experiments 
and  performance  in  more  detail  next  in  Section  5. 

5  Experimental  Results  and  Discussion 

We  studied  the  performance  of  the  heuristic  for  the  Minimum  Knockout  Problem  on 
the  biological  data  set  published  in  [22].  In  this  work,  the  authors  constructed  a  com¬ 
prehensive  epidermal  growth  factor  receptor  (EGFR)  signaling  network.  This  network 
is  known  to  have  a  significant  role  in  cancer  development  and  proliferation.  Given  the 
amount  of  research  currently  focused  on  this  network,  benchmarks  for  our  heuristic 
on  this  network  will  likely  give  a  very  accurate  sense  of  how  well  the  heuristic  will 
perform. 

The  EGFR  network  contains  292  interactions  involving  330  different  molecules. 
The  graph  of  this  network  is  highly  connected  as  shown  in  Fig.  5(b).  Nearly  half  of  the 
molecules  reach  between  350  and  400  nodes  in  the  graph. 

To  test  the  heuristic,  we  manually  selected  30  pairs  of  S  and  T  node  sets  from  the 
network.  The  only  selection  criteria  applied  was  a  rough  attempt  to  choose  S  and  T 
so  that  the  nodes  in  opposite  sets  were  far  from  one  another,  increasing  the  likelihood 
of  non-trivial  solutions.  Beyond  this,  the  nodes  were  selected  at  random.  Sets  varied  in 
size  from  1  to  10. 

Heuristic  Accuracy:  The  heuristic  was  set  to  run  for  100  iterations  on  each  of  the 
(S',  r)  pairs.  In  every  case,  the  heuristic  reported  a  minimum  knockout  set  of  size  1. 
Since  the  smallest  possible  minimum  knockout  set  has  cardinality  equal  to  1,  we  con- 
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eluded  that  every  time  the  heuristic  correctly  identified  a  minimum  knockout  set.  This 
result  is  remarkable  for  two  reasons. 

(1)  A  minimum  knockout  set  of  1  occurs  with  unexpected  frequency.  This  result  is 
best  explained  by  the  degree  of  connectivity  in  the  network.  Fig.  5(a)  shows  that  over 
half  of  the  nodes  in  the  network  have  very  extensive  connectivity  within  the  graph.  This 
is  consistent  with  other  studies  of  connectivity  within  biological  networks  [18,  2,  29]. 
Because  of  the  properties  of  the  Remove  operation,  removing  such  a  highly  connected 
node  from  the  graph  will  have  global  impact  on  the  connectivity  of  other  nodes. 

(2)  The  heuristic  correctly  found  a  minimum  knockout  set  every  time.  This  is  cer¬ 
tainly  a  property  of  the  network:  if  one  chooses  a  molecule  at  random  in  the  network, 
there  is  a  50%  chance  that  it  will  connect  to  400  other  nodes  in  the  network.  Ultimately, 
while  it  is  easy  to  envision  cases  that  will  be  difficult  for  the  heuristic  to  handle,  our 
results  indicate  that  the  EGFR  network,  well-studied  and  important  in  research,  has  few 
difficult  cases,  if  any. 

We  observed  that,  though  correct,  often  the  heuristic  chose  nodes  which  discon¬ 
nected  large  sections  of  the  graph  from  S.  Biologically,  it  is  favorable  to  target  nodes 
that  have  the  smallest  global  impact  while  still  disconnecting  S  and  T.  While,  prelim¬ 
inary  analysis  was  not  able  to  establish  whether  the  disconnection  of  these  nodes  was 
necessary,  we  consider  the  problem  of  identifying  the  minimum  knockout  set  that  also 
minimizes  the  number  of  non-T  nodes  disconnected  from  S  to  be  an  important  exten¬ 
sion  to  this  problem. 

Heuristic  Performance:  We  implemented  the  heuristic  shown  in  Fig.  4  in  Java.  A 
set  of  100  iterations  of  the  algorithm  took  approximately  one  second  to  complete  on  a 
Apple  1.33  GHz  G4  laptop  running  Mac  OS  X. 

A  tool  implementing  the  model  and  algorithms  described  in  this  paper  is  available 
for  download.  The  model  and  algorithms  are  implemented  in  Matlab;  for  better  perfor¬ 
mance,  the  interface  is  implemented  in  Java.  Fig.  5(b)  shows  a  screenshot  of  the  tool 
in  use  on  a  small  model  of  the  AKT  network.  The  tool  can  load  networks  stored  in  the 
SBML  format,  allowing  biologists  to  import  networks  designed  in  CellDesigner  and 
other  biological  network  editors  [13]. 

6  Conclusions  and  Future  Work 

In  this  paper  we  have  presented  a  formal  graph  model  that  permits  the  use  of  graph  the¬ 
ory  to  reason  about  the  properties  of  biological  networks.  In  addition,  we  have  charac¬ 
terized  two  important  research  questions  pertaining  to  biological  networks,  formulated 
them  on  our  model,  and  provided  efficient  and  accurate  algorithms  for  solving  them.  To 
our  knowledge,  this  is  the  first  paper  to  formally  define  and  propose  a  computational  so¬ 
lution  to  the  Minimum  Knockout  Problem.  Despite  being  NP-Hard,  our  heuristic  shows 
excellent  performance  on  a  large  and  important  network  in  the  research  community. 

Moving  forward,  we  recognize  that  a  useful  addition  to  the  current  heuristic  for  the 
Minimum  Knockout  Problem  is  the  ability  to  return  a  set  of  minimum  knockout  sets 
rather  than  just  a  single  one.  Furthermore,  we  intend  to  consider  additional  biological 
constraints,  such  as  selecting  the  minimum  knockout  set  with  the  least  impact  to  global 
connectivity  of  the  graph.  There  is  also  work  to  be  done  in  studying  other  existing  and 
new  problems  under  the  pathway  graph  model. 
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Abstract 

Reconstructing  cellular  signaling  networks  and  understanding  how  they  work  are  major  endeavors  in  cell  biology.  The  scale 
and  complexity  of  these  networks,  however,  render  their  analysis  using  experimental  biology  approaches  alone  very 
challenging.  As  a  result,  computational  methods  have  been  developed  and  combined  with  experimental  biology 
approaches,  producing  powerful  tools  for  the  analysis  of  these  networks.  These  computational  methods  mostly  fall  on 
either  end  of  a  spectrum  of  model  parameterization.  On  one  end  is  a  class  of  structural  network  analysis  methods;  these 
typically  use  the  network  connectivity  alone  to  generate  hypotheses  about  global  properties.  On  the  other  end  is  a  class  of 
dynamic  network  analysis  methods;  these  use,  in  addition  to  the  connectivity,  kinetic  parameters  of  the  biochemical 
reactions  to  predict  the  network's  dynamic  behavior.  These  predictions  provide  detailed  insights  into  the  properties  that 
determine  aspects  of  the  network's  structure  and  behavior.  However,  the  difficulty  of  obtaining  numerical  values  of  kinetic 
parameters  is  widely  recognized  to  limit  the  applicability  of  this  latter  class  of  methods.  Several  researchers  have  observed 
that  the  connectivity  of  a  network  alone  can  provide  significant  insights  into  its  dynamics.  Motivated  by  this  fundamental 
observation,  we  present  the  signaling  Petri  net,  a  non-parametric  model  of  cellular  signaling  networks,  and  the  signaling 
Petri  net-based  simulator,  a  Petri  net  execution  strategy  for  characterizing  the  dynamics  of  signal  flow  through  a  signaling 
network  using  token  distribution  and  sampling.  The  result  is  a  very  fast  method,  which  can  analyze  large-scale  networks, 
and  provide  insights  into  the  trends  of  molecules'  activity-levels  in  response  to  an  external  stimulus,  based  solely  on  the 
network's  connectivity.  We  have  implemented  the  signaling  Petri  net-based  simulator  in  the  PathwayOracle  toolkit,  which  is 
publicly  available  at  http://bioinfo.cs.rice.edu/pathwayoracle.  Using  this  method,  we  studied  a  MAPK1,2  and  AKT  signaling 
network  downstream  from  EGFR  in  two  breast  tumor  cell  lines.  We  analyzed,  both  experimentally  and  computationally,  the 
activity  level  of  several  molecules  in  response  to  a  targeted  manipulation  of  TSC2  and  mTOR-Raptor.  The  results  from  our 
method  agreed  with  experimental  results  in  greater  than  90%  of  the  cases  considered,  and  in  those  where  they  did  not 
agree,  our  approach  provided  valuable  insights  into  discrepancies  between  known  network  connectivities  and  experimental 
observations. 
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Introduction 

Signaling  networks  are  complex,  interdependent  cascades  of 
signals  that  process  extracellular  stimuli,  received  at  the  plasma 
membrane  of  a  cell,  and  funnel  them  to  the  nueleus,  where  they 
enter  the  gene  regulatory  system.  These  signaling  networks 
underlie  how  cells  communicate  with  one  another,  and  how  they 
make  decisions  about  their  phenotypic  changes,  such  as  division, 
differentiation,  and  death.  Further,  malfunction  of  these  networks 
may  alter  phenotypic  changes  that  cells  are  supposed  to  undergo 
under  normal  conditions,  and  potentially  lead  to  devastating 
consequences  on  the  organism.  For  example,  altered  cellular 
signaling  networks  can  give  rise  to  the  oncogenic  properties  of 
cancer  cells  [1,2],  increase  a  person’s  susceptibility  to  heart  disease 
[3],  and  have  been  shown  to  be  responsible  for  many  other 


devastating  diseases  such  as  congenital  abnormalities,  metabolic 
disorders  and  immunological  abnormalities  [1,4]. 

In  light  of  the  crucial  role  signaling  networks  play  in  the 
proper  functioning  of  cells  and  biological  systems  as  a  whole,  and 
given  the  grave  consequences  their  alterations  may  have  on  the 
behavior  of  cells,  elucidating  the  connections  in  the  networks, 
and  understanding  how  they  operate,  are  two  central  questions 
in  cell  biology.  Flowever,  unlike  the  “pathway  view”  of  signaling 
as  linear  cascades,  signaling  networks  are  highly  interconnected, 
involve  cross-talk  among  several  pathways,  and  contain  feedback 
and  feed-forward  loops  [5].  Figure  1  illustrates  this  issue  in  a 
network  of  signaling  cascades,  which  is  stimulated  by  EGF  and 
contains  several  players  in  cancer  pathways.  For  example, 
multiple  paths  lead  from  EGFR  to  mTOR-Raptor,  resulting  in 
feed-forward  loops.  Some  of  these  paths  activate  mTOR-Raptor, 
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Author  Summary 

Many  cellular  behaviors  including  growth,  differentiation, 
and  movement  are  influenced  by  external  stimuli.  Such 
external  stimuli  are  obtained,  processed,  and  carried  to  the 
nucleus  by  the  signaling  network — a  dense  network  of 
cellular  biochemical  reactions.  Beyond  being  interesting 
for  their  role  in  directing  cellular  behavior,  deleterious 
changes  in  a  cell's  signaling  network  can  alter  a  cell's 
responses  to  external  stimuli,  giving  rise  to  devastating 
diseases  such  as  cancer.  As  a  result,  building  accurate 
mathematical  and  computational  models  of  cellular 
signaling  networks  is  a  major  endeavor  in  biology.  The 
scale  and  complexity  of  these  networks  render  them 
difficult  to  analyze  by  experimental  techniques  alone, 
which  has  led  to  the  development  of  computational 
analysis  methods.  In  this  paper,  we  present  a  novel 
computational  simulation  technique  that  can  provide 
qualitatively  accurate  predictions  of  the  behavior  of  a 
cellular  signaling  network  without  requiring  detailed 
knowledge  of  the  signaling  network's  parameters.  Our 
approach  makes  use  of  recent  discoveries  that  network 
structure  alone  can  determine  many  aspects  of  a 
network's  dynamics.  When  compared  against  experi¬ 
mental  results,  our  method  correctly  predicted  90%  of 
the  cases  considered.  In  those  where  it  did  not  agree,  our 
approach  provided  valuable  insights  into  discrepancies 
between  known  network  structure  and  experimental 
observations. 


while  others  inhibit  it.  Further,  the  network  contains  two 
feedback  loops,  one  from  p70S6K  to  EGFR  and  another  from 
MAPK1,2  to  EGFR. 

These  and  other  complexities  make  it  very  difficult  to  analyze 
signaling  networks  by  experimental  biology  approaches  alone.  As  a 
result,  computational  methods  have  been  developed  and  com¬ 
bined  with  experimental  biology  approaches,  producing  powerful 
tools  for  the  analysis  of  these  networks  [6].  These  computational 
methods  produce  hypotheses  that  guide  the  experimental  design, 
leading  to  more  informative  experiments,  while  experimental 
results  help  refine  the  computational  models,  resulting  in  more 
accurate  predictive  tools. 

In  a  recent  survey,  Papin  et  al.  classified  existing  computational 
methods  into  two  categories:  structural  and  djnamu  network  analysis 
[6].  Structural  network  analysis  is  mainly  based  on  the  network’s 
connectivity,  which  is  typically  readily  available  from  numerous 
public  signaling  network  databases  (e.g.,  [7-9]),  and  makes 
inferences  about  global  network  properties  as  well  as  individual 
protein  functions.  This  category  can  be  further  refined  into  two 
sub-categories,  both  of  which  are  solely  based  on  connectivity 
information,  yet  differ  in  the  type  of  answers  they  provide.  For 
example,  the  methods  described  in  [10-13]  infer  “static” 
properties  of  the  network,  such  as  numbers  of  paths,  reachability 
results,  etc.  In  a  series  of  papers,  Palsson  and  co-workers  [6,14—16] 
introduced  extreme  pathway  analysis  techniques,  which  are  more 
appropriate  for  metabolic  networks,  yet  have  been  applied  to 
signaling  networks  to  characterize  various  properties  of  networks, 
such  as  redundancy  and  cross-talk.  Similar  analyses  have  also  been 


Growth 

Factors  EQp 


Figure  1 .  The  Model  Signaling  Network.  A  MAPK1,2  and  AKT  network  downstream  from  EGFR,  which  we  assembled  from  various  sources,  and  used 
for  the  case  study  analysis  in  this  work.  An  edge  from  u  to  v  ending  with  an  arrow  indicates  an  activating  reaction,  while  an  edge  ending  with  a  plunger 
indicates  an  inhibiting  reaction.  With  the  exception  of  TSC2,  all  nodes  have  self-inhibitory  edges,  which  were  added  to  model  the  external  cellular 
machinery  that  regulates  the  concentration  of  the  active  form  of  the  proteins  [36-43].  Colors  were  selected  to  enhance  readability  of  the  network. 
doi:1 0.1 371/journal.pcbi.1  OOOOOS.gOOl 
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formalized  and  conducted  using  the  principles  of  S-  and  T- 
invariants  in  Petri  Nets  (e.g.,  [1 7—20]). 

Methods  for  dynamic  network  analysis  use,  in  addition  to  the 
network  connectivity,  the  kinetic  parameters  of  the  biochemical 
reactions.  The  goal  of  these  methods  is  to  model  the  actual  kinetics  of 
the  network  and  obtain  through  simulation  the  actual  quantities  of 
proteins  involved  in  signal  transduction.  One  of  the  most  widely  used 
techniques  in  this  category  is  systems  of  ordinary  differential 
equations  (ODEs)  (e.g.,  [21-25]).  Within  such  a  system,  each 
reaction  is  modeled  by  a  series  of  equations  connecting  reactant 
concentrations  to  product  concentrations  through  differential 
relationships  involving  reaction  rate  constants.  Given  the  difficulty 
of  obtaining  the  numerical  values  of  kinetic  parameters  [19,26]  and 
standardization  of  the  parameters  and  models  [27],  the  applicability 
of  these  methods  is  limited  in  practice  to  small-scale  networks  [6,28]. 

Petri  Nets  have  also  been  used  for  simulating  the  dynamics  of 
signaling  networks  [29-31].  While  such  approaches  somewhat  relax 
the  necessity  for  biologically  exact  kinetic  parameters,  current  Petri 
Net-based  approaches  sdll  require  the  selection  of  weights  and/or 
probability  distributions  for  individual  interactions  in  the  model.  As  a 
result,  selecting  the  values  for  Petri  Net  parameters  presents 
challenges  similar  to  those  encountered  in  ODE  modeling. 

Structural  network  analysis  assumes  mainly  connectivity  infor¬ 
mation  about  the  model,  and  provides  insights  into  global,  static 
properties  of  the  network.  Dynamic  analysis  in  general  assumes 
numerical  values  of  the  kinetic  parameters,  and  provides  predictions 
of  network  dynamics  by  quantifying  the  change  in  concentration  and 
activity-level  (the  concentration  of  the  active  form  of  a  given  protein) 
of  the  individual  proteins  and  complexes  in  the  network.  To  obtain  a 
more  detailed  analysis  one  must  either  solve  parameter  optimization 
problems  for  a  large  number  of  molecules  and  interactions  or 
conversely  experimentally  derive  these  values. 

Given  the  difficulty  of  obtaining  numerical  values  of  kinetic 
parameters  [19,26]  and  the  implications  this  has  on  the 
applicability  of  dynamic  analysis  methods  [6],  it  is  imperative  to 
develop  innovative  approaches  that  combine  the  attractive  low 
requirements  of  structural  network  analysis  techniques  with  the 
detailed  answers  provided  by  dynamic  analysis  techniques — 
specifically  the  response  of  individual  proteins  to  signals  which 
travel  through  the  network. 

Several  recent  efforts  in  this  direction  have  produced  encour¬ 
aging  results.  An  approach  using  a  boolean  network  simulation 
method,  based  on  work  in  the  area  of  gene  regulatory  networks, 
successfully  used  only  signaling  network  connectivity  information 
to  predict  the  speed  of  signal  transduction  through  a  stomata 
signaling  network  [32].  The  use  of  piecewise  linear  systems  of 
ODEs  have  also  had  success  in  analyzing  some  of  the  dynamics  of 
gene  regulatory  and  signaling  networks  without  using  exact  kinetic 
parameters  (e.g.,  [33-35]).  The  obstacle  to  extending  the  method 
in  [32]  to  model  individual  protein  responses  to  signal  transduc¬ 
tion  is  the  boolean  model  used  to  discretize  the  signal  as  it 
propagates.  In  a  boolean  model,  the  signal  is  either  present  or 
absent  at  each  node  in  the  network.  Such  two-state  models  of 
signal  transduction  simplify  the  underlying  biochemistry  to  the 
point  where  it  is  difficult  to  model  changes  in  protein 
concentration  more  precisely  than  present  or  absent.  Modeling 
such  gradients  of  concentration  changes  and  the  effects  of  those 
changes  may  be  important  to  predicting  individual  protein 
responses,  motivating  our  effort  to  devise  more  fine-grained  ways 
to  model  and  simulate  the  dynamics  of  signaling  networks.  The 
challenges  to  using  linear-piecewise  ODEs  to  model  a  signaling 
network  center  around  the  issue  of  identifying  all  the  ODEs 
required  to  model  the  underlying  network  as  well  as  scalability 
issues  involved  in  simulating  large  systems  of  ODEs. 


In  this  paper,  we  extend  the  synchronized  Petri  net  model  and 
firing  policy  such  that  the  resulting  framework  models  cellular 
signaling  processes.  We  call  this  extension  the  signaling  Petri  net 
(SPN).  By  coupling  this  with  a  novel  strategy  for  Petri  net 
execution  and  sampling,  we  obtain  a  method  capable  of 
characterizing  some  dynamics  of  signaling  networks  while  using 
only  connectivity  information  about  these  networks. 

To  validate  our  method,  we  studied  the  MAPK1,2  and  AKT 
network  shown  in  Figure  1  in  two  breast  cancer  cell  lines.  This 
network  was  chosen  because  the  EGER  receptor  and  its 
downstream  signaling  network  play  a  very  important  role  in 
development,  differentiation,  and  oncogenic  transformation.  Two 
very  important  signaling  molecules  within  the  cell  are  MAPK  and 
AKT,  both  of  which  can  be  activated  by  EGER,  and  contains 
several  potential  regulatory  paths  between  them.  We  constructed  a 
model  network  of  EGE  regulation  of  MAPK  and  AKT  which 
includes  several  feedback  and  feed-forward  loops  all  of  which  were 
constructed  based  on  experimental  findings  from  different 
laboratories  around  the  world  [36-43].  We  analyzed,  both 
experimentally  and  computationally,  the  change  in  activity-level 
of  several  proteins  in  response  to  targeted  manipulation  of  TSC2 
and  mTOR-Raptor.  Using  the  model  network,  the  predictions 
from  our  method  agreed  with  experimental  results  in  over  90%  of 
the  cases,  and  in  those  where  they  did  not  agree,  our  method 
correcdy  identified  discrepancies  that  could  be  traced  back  to 
incompleteness  in  the  network  connectivity  model. 

Materials  and  Methods 

Our  approach  combines  elements  of  the  boolean  network 
simulator  in  [18]  with  a  synchronized  Petri  net  model  [44].  In  [18], 
Li  et  al.  present  a  non-parametric  approach  that  accurately  predicts 
the  speed  of  signal  propagation  through  a  network.  However,  as  their 
method  assumes  a  binary  model  of  activation — every  protein  is  either 
active  {true)  or  inactive  {fake) — modeling  a  range  of  activity-levels  is 
difficult.  Petri  nets,  while  able  to  model  concentrations  using  tokens, 
require  parameters  describing  the  kinetic  characteristics  of  the 
network,  which  are  typically  difficult  to  obtain. 

Our  method  models  signal  flow  as  the  pattern  of  token 
accumulation  and  dissipation  within  places  (proteins)  over  time 
in  the  Petri  net.  Transitions  in  the  network  represent  directed 
protein  interactions;  each  transition  models  the  effect  of  a  source 
protein  on  a  target  protein.  Through  transition  firings,  the  source 
can  influence  the  number  of  tokens  assigned  to  the  target,  called 
the  token-count,  modeling  the  way  that  signals  propagate  through 
protein  interactions  in  cellular  signaling  networks. 

In  order  to  overcome  the  issue  of  modeling  reaction  rates  in  the 
network,  signaling  dynamics  are  simulated  by  executing  the 
signaling  Petri  net  (SPN)  for  a  set  number  of  steps  (called  a  run) 
multiple  times,  each  time  beginning  at  the  same  initial  marking. 
Eor  each  run,  the  individual  signaling  rates  are  simulated  via 
generation  of  random  orders  of  transition  firings  (interaction 
occurrences).  When  the  results  of  a  large  enough  number  of  runs 
are  averaged  together,  we  find  that  the  series  of  token-counts 
correlate  with  experimentally  measured  changes  in  the  activity- 
levels  of  individual  proteins  in  the  underlying  signaling  network.  In 
essence,  the  tokenized  activity-levels  computed  by  our  method 
should  be  taken  as  abstract  quantities  whose  changes  over  time 
correlate  to  changes  that  occur  in  the  amounts  of  active  proteins 
present  in  the  cell.  It  is  worth  noting  that  some  of  the  most  widely 
used  experimental  techniques  for  protein  quantification — western 
blots  and  microarrays — also  yield  results  that  are  treated  as 
indications,  but  not  exact  measurements,  of  protein  activity-levels 
within  the  cell.  Thus  in  some  respects,  the  predictions  returned  by 
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our  SPN-based  simulator  can  be  interpreted  like  the  results  of  a 
western  blot  or  microarray  experiment  looking  at  changes  relative 
to  “control”. 

The  key  insight  behind  our  approach  is  the  assumption  that, 
while  all  network  parameters  determine  the  actual  signal 
propagation  to  some  extent,  the  network  connectivity  is  the  most 
significant  single  determinant.  While  this  is  clearly  a  gross 
simplification,  several  researchers  have  observed  that  the  connec¬ 
tivity  of  a  biological  network  dietates,  to  a  great  extent,  the 
network’s  dynamics  [18,45-47].  Some  have  conjectured  that 
biological  network  connectivities  have  evolved  to  have  a  stabilizing 
effect  on  the  overall  network  behavior,  making  the  network  more 
resilient  to  local  fluctuations  in  other  network  parameters  such  as 
reaction  rates  and  protein  binding  affinities  [45,47].  Here  we 
present  the  signaling  Petri  net  (SPN)  model  and  the  signaling  Petri 
net-based  simulator  whose  designs  collectively  utilize  this  assump¬ 
tion  and  couple  it  with  a  Petri  net  tokenization  scheme  that 
quantifies  the  ehanges  in  protein  aetivity-levels  that  occur  as 
signals  propagate  through  the  network.  In  the  following  seetions, 
we  describe  the  synchronized  Petri  net,  how  we  extended  it  to 
create  the  signaling  Petri  net,  and  a  novel  strategy  for  executing 
the  signaling  Petri  net  to  simulate  signaling  network  dynamics. 

Petri  Nets 

A  Petri  net  is  a  graph  that  consists  of  two  types  of  nodes,  places, 
and  transitions  [44] .  Edges  in  the  graph,  called  arcs,  are  directed  and 
connect  places  to  transitions  or  transitions  to  places.  Thus,  the 
Petri  net  is  a  bipartite  graph.  Formally,  a  Petri  net  is  a  4-tuple 
0,=  {P,T,I,0)  where 

P=  {p\,p2,---,pm}  is  the  set  of  places, 

T=  {ti,t2,...,t„}  is  the  set  of  transitions, 

1=  {?i,t2,...,4}  is  the  set  of  input  arcs  where  for  all  {u,v)el,  ueP 

and  veT,  and 

0  =  {oi,02,. .  .,0;}  is  the  set  of  output  arcs  where  for  all  {u,v)el,  ueT 

and  veP. 

In  order  to  simulate  a  dynamic  process,  a  number  of  tokens  is 
assigned  to  each  place  in  order  to  indicate  the  presence  of  some 
quantitative  property.  This  assignment  of  tokens  to  places  encodes 
the  state  of  the  system  and  is  called  a  marking,  denoted  m.  A 
marked  Petri  net,  R  =  ((^mo),  is  a  Petri  net  with  a  marking  mo,  called 
the  initial  marking.  For  the  remainder  of  this  paper,  the  term  Petri 
net  (PN)  refers  to  a  marked  Petri  net. 

Changes  in  the  state  of  the  system  are  simulated  by  executing  the 
Petri  net — evaluating  the  effect  of  transitions  on  the  marking  of  the 
network.  These  ehanges  in  marking  are  induced  by  sequential^nn^ 
one  or  more  transitions.  When  a  transition  fires,  it  removes  a  token 
from  each  place  connected  to  it  by  input  arcs  and  adds  a  token  to 
each  place  connected  to  it  by  output  arcs.  The  number  of  tokens 
removed  from  inputs  and  added  to  outputs  can  be  specified  by 
weighting  the  input  arcs.  However,  as  our  extension  does  not  use 
this  weighting  property,  we  do  not  consider  this  very  common  PN 
formulation  here. 

A  transition  can  only  fire  when  it  is  enabled,  meaning  that  each  of 
its  input  places  has  at  least  one  token  in  the  current  marking.  If  a 
transition  t,  when  fired  on  a  marking  mi,  produces  marking  m2, 
then  we  write  mi  |  t)m2. 

This  notation  can  be  extended  to  represent  the  effect  of  firing  a 
series  of  transitions.  A  firing  sequence,  ct  =  {ti,t2,...,tjj  is  a  sequence  of 
transitions.  The  sequence’s  cumulative  effect  on  the  system’s  state 
is  denoted  mo  |  a)my where  mo  is  the  initial  marking  and  ny  is  the 
marking  produced  by  the  firing  of  the  sequence  of  transitions  in 
the  order  specified  in  a.  In  this  paper,  we  write  to  indicate  the 


marking  produced  by  the  first  g  transitions  in  a.  Therefore,  in  the 
above  example,  tMq  =«Jo  and  =mf. 

For  a  more  complete  introduction  to  types  of  Petri  nets  and 
their  properties,  we  refer  the  reader  to  [44]. 

Synchronized  Petri  nets.  Synchronized  Petri  nets  model 
systems  in  which  the  firing  of  a  transition  is  triggered  by  a  speeific 
event  that  occurs  in  the  environment.  The  marked  Petri  net  is 
extended  to  inelude  a  set  of  these  events  and  a  mapping  funetion 
that  assigns  an  event  to  each  transition.  When  transition  t’s 
assigned  event  occurs,  transition  t  is  fired.  Formally,  a 
synchronized  Petri  net  is  a  3-tuple  {R,E,Sync),  where  [44]: 

R  is  a  marked  Petri  net, 

E=  {ei,«2,...,ej}  is  a  set  of  events,  and 

Sync'.T-^EiJ{e}  maps  each  transition  in  the  Petri  net  to  an 
event.  Event  e  is  the  always  occurring  event.  Any  transition  associated 
with  e  is  always  immediately  fired  upon  becoming  enabled. 

When  executing  a  synchronized  Petri  net,  transition  t  is  fired 
when  its  associated  event  e  =  Sync{t)  occurs.  The  order  in  whieh 
events  are  generated  depends  upon  the  environment  whieh 
generates  them.  Just  as  in  the  marked  Petri  net,  when  a  transition 
fires,  it  removes  one  token  from  each  place  connected  by  input 
arcs  and  gives  one  token  to  eaeh  place  connected  by  output  arcs. 

As  will  be  discussed  in  the  next  sections,  we  extend  the 
synehronized  Petri  net  paradigm  to  model  the  dynamics  of  a 
signaling  network.  To  our  knowledge,  ours  is  the  first  use  of  the 
synehronized  Petri  net  to  model  biochemical  systems.  In  prineiple 
it  is  well  suited  to  signaling  networks  since  places  represent 
proteins,  tokens  represent  concentrations,  and  transitions  represent 
directed  protein  interaetions.  A  model  of  signaling  event 
occurrence  can  be  used  to  generate  events  and  fire  transitions, 
providing  a  way  of  simulating  the  signaling  network’s  behavior. 
These  and  other  design  details  will  be  discussed  in  the  next  section. 

The  Signaling  Petri  Net-Based  Simulator 

A  high-level  sketch  of  our  simulator  is  given  is  Figure  2.  Details 
and  rationale  for  specific  design  decisions  will  be  discussed  in 
subsequent  sections. 

During  the  simulation,  the  input  signaling  Petri  net  is  executed 
multiple  times  on  a  firing  sequence  construeted  by  the  signaling 
event  generator.  The  signaling  event  generator  imposes  an 
ordering  on  transition  firing  such  that  it  creates  a  two-time  scale 
simulation.  The  smaller  time  scale  is  discretized  as  the  firing  of  a 
single  transition.  This  unit  is  referred  to  as  the  firing  time  scale. 
Firing  steps  are  nested  within  a  larger  time  scale,  called  time  blocks, 
in  which  each  transition  is  fired  exactly  once.  Thus,  there  are  |  T| 
firings  per  block.  Since  the  simulation  is  run  for  the  specified 
number  of  time  blocks,  B,  there  are  B  \  T\  firing  steps  in  the 
simulation. 

The  time  structure  for  an  example  simulation  is  illustrated  in 
Figure  3.  This  dual-time  approach  is  necessitated  by  the  rate 
parameter  sampling  strategy  we  employ.  Since  the  rate  parameters 
are  not  known,  our  method  executes  many  simulation  runs  (Step  2  in 
Figure  2)  in  order  to  sample  the  space  of  possible  rate  parameters. 
The  markings  returned  by  these  runs  are  then  averaged  (Step  3  in 
Figure  2).  The  only  requirement  plaeed  on  the  different  rate 
parameter  values  is  that  all  events  occur  within  the  same  larger  time 
frame — the  time  bloek.  Therefore,  within  every  time  block  all  edges 
are  evaluated  once,  though  not  necessarily  in  the  same  order. 

This  idea  of  evaluating  random  event  orderings  within  a  two- 
time  scale  system  has  appeared  before  in  the  domain  of 
transcriptional  networks  [48].  In  that  study,  Chaves  et  al. 
employed  a  two-time  scale  formulation  of  network  updates  similar 
in  concept  to  the  one  we  describe  here.  In  their  work,  they 
assumed  a  boolean  model  of  regulation  and  charaeterized  the 
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Procedure  Simulate(S,  B,  r) 

1.  Set  the  ioitial  marking  of  S 

2.  For  i  =  1  to  r 

(a)  Generate  a  random  sequence  of  the  events  in  E 

(b)  Simulate  the  network  by  executing  the  transitions  associated  with  the  events  in 
the  generated  sequence 

(c)  Record  the  number  of  tokens  at  each  node,  for  each  time  step 

3.  For  each  node,  compute  the  number  of  tokens  at  each  time  unit  f,  averaged  over  r 


Figure  2.  A  High-Level  Outline  of  the  Procedure  for  Simulating  a  Signaling  Network.  The  input  to  the  procedure  is  a  signaling  Petri  net,  S, 
the  number  of  time  units  to  simulate  the  network  for,  B,  and  the  number  of  runs  for  which  to  repeat  the  simulation,  r.  The  random  generation  of 
event  ordering  is  employed  to  simulate  the  stochasticity  in  reaction  rates  and  the  differing  times  of  signal  arrivals. 
doi:1 0.1 371/journal.pcbi.1 000005.g002 


effect  of  different  relative  rates  of  transcription  within  the  same 
network  on  the  final  steady  state  reached.  In  contrast,  our  method 
is  designed  to  operate  on  tokenized  models  of  signaling  networks 
with  the  ultimate  intent  of  predicting  the  activity-level  changes  of 
proteins  in  the  underlying  signaling  network  over  time. 

In  the  next  sections,  we  discuss  in  greater  detail  the  core  design 
decisions  underlying  our  method:  the  signaling  Petri  net,  transition 
firing,  signaling  network  event  generator,  constructing  the  initial 
marking  for  the  model,  and  sampling  signaling  rates.  We  then 
discuss  how  our  strategy  can  be  used  to  predict  the  outcome  of 
perturbation  experiments. 

The  Signaling  Petri  Net 

The  goal  of  our  method  is  to  predict  the  signal  flow  through  a 
cell-specific  network  under  specific  experimental  conditions.  As  a 
result,  the  signaling  Petri  net  model  must  characterize  the 
connectivity  of  the  signaling  network,  the  connectivity-level 
network  properties  that  are  unique  to  the  cell  type  and 
experimental  conditions  under  which  the  network  is  being  studied, 
and  the  signaling  processes  of  activation  and  inhibition. 

The  signaling  Petri  net  is  a  synchronized  Petri  net  with:  1)  a 
specifie  way  of  modeling  activating  and  inhibiting  interactions 
using  places,  transitions,  and  arcs;  2)  a  one-to-one  correspondence 
between  events  and  transitions  such  that  every  transition  is 
associated  with  a  unique  event;  3)  modified  rules  regarding  how 
many  tokens  are  moved  in  response  to  a  transition  firing;  and  4)  a 
signaling  network  event  generator. 

Places  correspond  to  the  activated  forms  of  signaling  proteins. 
The  number  of  tokens  assigned  to  place  p  in  marking  m,, 
abstraedy  represents  the  amount  of  active  protein  p  present  in  that 


network  state.  Signaling  interactions  are  modeled  using  transitions 
and  their  conneeted  input  and  output  arcs.  Each  transition,  t,  is 
associated  with  a  unique  signaling  event,  e,  such  that  when  e 
occurs,  transition  t  fires.  Figure  4  shows  the  equivalent  signaling 
Petri  net  for  a  signaling  network. 

Formally,  a  signaling  Petri  net  is  a  3-tuple  S  =  {R,E,Sync),  where: 

R  is  a  marked  Petri  net, 

E  is  a  set  of  signaling  events  such  that  |  £  |  =  |  T|  and  there  is  no 
always  occurring  event,  and 

Sync'.T-^E  is  a  one-to-one  mapping  which  assigns  each 
transition  a  unique  signaling  event. 

The  initial  marking  of  a  signaling  Petri  net,  mo,  represents  the 
state  of  rest  from  which  the  network  is  starting  and  being  simulated. 
Proteins  whose  concentrations  are  known  to  be  high  can  be  given  a 
large  number  of  tokens,  and  those  whose  concentrations  are  known 
to  be  low  can  be  assigned  few  or  zero  tokens.  Attention  to  the  initial 
marking  is  central  to  modeling  cell-specific  networks.  In  many  cell 
lines,  specific  proteins  are  known  to  contain  mutations  that  render 
them  perpetually  active  or  inactive  [49] .  Furthermore,  experimental 
studies  frequently  involve  the  targeted  manipulation  of  various 
proteins  within  the  network.  Both  of  these  phenomena  induce  state 
changes  in  certain  proteins  at  various  time  points  that  must  be 
modeled.  The  way  in  which  these  are  modeled  will  be  discussed 
when  the  simulator  design  is  explained. 

Transition  Firing 

When  a  signaling  interaction  A-^B  (A  activates  B)  or  AHB  (A 
inhibits  B)  occurs,  it  has  the  effect  of  changing  the  state  of  the  system 
by  modifying  the  activity-level  of  A  and/ or  B.  Thus,  in  the  SPN 
used  to  model  this  network,  the  associated  transition,  t,  will  fire  at 
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Figure  3.  The  Effects  of  Reaction  Rates  on  Signal  Propagation.  (A)  By  changing  the  speed  of  signaling  edge  3,  the  value  of  D  at  the  end  of  a 
single  simulation  step  can  be  reversed.  If  edge  3  is  slower  than  the  cascade  B^CID,  then  D  will  be  active.  If  edge  3  is  faster  than  the  cascade,  then  D 
will  be  inactive.  (B)  An  example  of  how  the  simulator  might  evaluate  the  individual  edges  during  a  run.  In  each  time  block,  every  edge  is  evaluated 
once.  Each  edge  evaluation  corresponds  to  one  time  step.  Note  that  the  order  of  the  edge  evaluation  is  shuffled  during  each  time  block  in  order  to 
sample  the  space  of  possible  relative  signaling  rates. 
doi:1 0.1 371/journal.pcbi.1 000005.g003 
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Figure  4.  An  Example  Signaling  Network  and  Its  Corresponding  Petri  Net.  An  example  signaling  network  (A)  and  its  corresponding  Petri  net 
(B).  Each  signaling  protein  in  the  network,  A,  B,  and  C,  are  designated  as  places  Pa,  Pb,  and  po  Signaling  interactions  become  a  transition  node  and  its 
input  and  output  arcs.  Note  that  the  connectivity  for  an  activating  edge  differs  from  that  of  an  inhibitory  edge. 
doi:1 0.1 371/journal.pcbi.1 000005.g004 


time  T  and  produce  marking  m.j+i  from  m.^.  The  way  in  which 
m.j+1  is  computed  from  depends  on  the  set  of  input  and  output 
arcs  attached  to  the  transition  as  well  as  the  number  of  tokens 
moved  by  the  transition. 

The  combination  of  input  and  output  arcs  connected  to  a 
transition  is  determined  exclusively  by  the  type  of  interaction  and 
the  transition  firing  model.  However,  different  topologies, 
combinations  of  input  and  output  arcs,  are  needed  to  model  the 
different  biochemical  processes  that  mediate  protein-protein 
interactions  in  a  signaling  network.  Here  we  examine  four  of  the 
most  common  biochemical  processes,  identify  the  corresponding 
topological  motifs,  and  ultimately  devise  a  modeling  policy  best 
suited  for  non-parametric  simulation  of  signal  flow. 

In  post-translational  modification  (PTM),  a  protein  mediates  the 
addition  or  removal  of  a  phospho  group  at  a  specific  phosphor¬ 
ylation  site  on  another  protein.  In  GTP/ATP  binding,  a  protein 
triggers  the  exchange  of  GDP  (ADP)  from  GTP  (ATP)  on  another 
protein.  In  a  recruitment  process,  a  protein  mediates  the  relocaliza¬ 
tion  of  another  protein  to  a  different  part  of  the  cell.  Finally,  in  a 
complexing  process,  a  protein  binds  to  another  protein  to  create  a 
complex,  which  can  then  participate  in  other  reactions.  In  the  first 
two  processes,  the  mediating  protein  usually  acts  as  an  enzyme 
that  participates  in  the  reaction  but  is  not  consumed  by  the 
reaction.  In  the  latter  two  processes,  the  participating  protein  often 
becomes  unavailable  to  other  reactions,  transiently  while  the 
protein  recruitment  is  taking  place  and  for  longer  durations  when 
complexing  occurs.  To  model  these  two  cases,  we  identified  the 
two  different  token-passing  policies  implemented  by  the  different 
topological  motifs  depicted  in  Figure  5. 

Token  consumption.  In  this  policy,  uHv  consumes  tokens  in  u 
in  order  to  generate  new  tokens  for  v.  In  order  to  model  this,  p^  is 
connected  to  transition  ti  through  an  arc  and  Pv  is  connected  to  ti 
through  an  output  arc.  When  ti  fires,  some  number  of  tokens  in  p^ 
are  moved  into  Pv.  Similarly,  uHv  consumes  tokens  in  u  in  order  to 
consume  tokens  in  v.  This  is  modeled  by  connecting  pu  to  tz  with  an 
input  arc  and  Pv  to  t2  with  an  input  arc.  When  tz  fires,  some  number 


of  tokens  are  removed  from  both  pu  and  Pv.  This  policy  models  a 
recruitment  or  complexing  event  in  which  u  binds  to  another 
molecule,  thereby  creating  a  molecule  of  type  v.  A  molecule  of  type  u  has 
been  consumed  in  order  to  generate  or  deactivate  a  molecule  of  type  v. 

Token  conservation.  In  this  policy,  uHv  generates  new 
tokens  for  v  while  conserving  those  in  u.  In  order  to  model  this,  Pu 
is  connected  to  transition  ts  through  a  read  arc.  Node  Pv  is 
connected  to  G  through  an  output  arc.  When  ts  fires,  some 
number  of  tokens  in  pu  is  read  (but  not  removed)  and  copied  into 
Pv.  Similarly,  uHv  consumes  tokens  in  v  while  conserving  those  in 
u.  This  is  modeled  by  connecting  pu  to  t4  with  a  read  arc  and  Pv  to 
t4  with  an  input  arc.  When  t4  fires,  some  number  of  tokens  in  p^, 
are  read  and  removed  from  pv.  Enzymes  will  often  behave  in  this 
way:  inducing  a  change  in  a  molecule  (v)  without  themselves 
undergoing  any  change.  A  molecule  of  u  has  induced  a  change  in  a 
different  molecule  of  type  v  without  itself  changing  state. 

Ideally,  for  each  interaction  in  the  network,  the  associated 
transition  could  be  embedded  in  the  topology  corresponding  to  the 
interaction’s  underlying  biochemical  mechanism.  However,  connec¬ 
tivity-level  knowledge  of  the  network  does  not  provide  this 
information  for  each  interaction.  In  the  absence  of  these  details, 
we  use  one  token-passing  policy  for  all  interactions  in  the  network. 
We  implemented  and  tested  both  the  consuming  and  conserving 
policies  and  found  that  token  conservation  provides  significantly 
more  accurate  results  when  compared  to  experimentally  derived 
data.  This  is  not  surprising,  as  post-translational  modification  and 
GTP/ ATP  binding  events  are  responsible  for  many  activation  state 
changes  in  signaling  networks  [1,50-52].  It  is  worth  noting  that  our 
approach  does  not  restrict  the  net  structure  to  token  conserving 
topologies.  Thus,  it  is  possible  to  use  the  token  consumption 
topologies  where  such  processes  are  known  to  occur.  However,  as 
our  focus  in  this  paper  is  designing  a  purely  non-parametric 
simulation  method,  we  consider  the  use  of  information  regarding  the 
biological  mechanism  of  signaling  as  a  potential  way  to  further 
improve  the  accuracy  of  our  method’s  predictions  and  identify  this  as 
a  direction  for  future  work. 
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Figure  5.  The  Topological  Motifs  for  Differing  Signaling  Processes.  (A)  The  token  consumption  motifs  for  complexing  and  recruitment. 
Transition  ti  encodes  activation  of  v  by  the  binding  or  consumption  of  u.  Transition  t2  encodes  deactivation  of  v  by  the  binding  or  consumption  of  u. 
In  both  cases,  the  number  of  tokens  of  Pu  decreases  immediately  after  transitions  ti  and  t2  fire.  (B)  The  token  conserving  motifs  for  PTM  and  GTP/ATP 
binding.  Transition  t3  encodes  enzymatic  activation  of  v  by  u.  Transition  t4  encodes  enzymatic  inhibition  of  v  by  u.  In  both  cases,  the  number  of 
tokens  of  Pu  remains  unchanged  immediately  after  transitions  ts  and  t4  fire. 
doi:1 0.1 371/journal.pcbi.1  OOOOOS.gOOS 


The  transition  topologies,  as  described  above,  do  not  designate 
how  the  number  of  tokens  added  to  or  removed  from  is 
determined.  However,  we  know  that  in  biochemical  signaling 
networks  concentration  has  an  effect  on  the  strength  of  a 
signaling  event  [53-55].  Specifically,  the  higher  u’s  concentra¬ 
tion,  the  stronger  its  effect  on  v — the  more  tokens  that  Pu  has, 
the  more  tokens  of  Pv  should  be  affected  (generated  or 
consumed). 

However,  because  of  the  stochastic  nature  of  the  underlying 
biochemistry,  it  would  be  inaccurate  to  assume  that  all  active  u 
molecules  will  always  participate  in  an  interaction  with  v.  In  order 
to  accommodate  this  observation,  when  transition  t  fires,  we 
randomly  select  the  number  of  Pu’s  tokens  to  be  involved  in  the 
subsequent  evaluation  of  the  transition,  which  we  call  a  signaling 
event.  Note  that,  according  to  our  choice  of  topology,  Pu  can  always 
be  identified  as  the  node  connected  to  the  transition  by  a  read  arc. 
In  this  paper,  we  assume  a  uniform  distribution  for  selecting  the 
number  of  tokens  involved  in  a  given  signaling  event,  but 
acknowledge  that  other  distributions  may  be  more  appropriate 
under  certain  circumstances  and  identify  this  as  a  topic  deserving 
further  consideration. 

Let  mjgi)  denote  the  number  of  tokens  in  node  x  at  time  s.  For  an 
interaction  (u,v),  under  the  token  conservation  policy  detailed 
above,  u’s  token-count  remains  unchanged  after  the  firing  of  t, 
whereas  v’s  token-count  is  updated  based  on  the  following 
formula: 

{  ms-\{v) -\-random(Q,ms-\(u))  if  u  activates  v 

»is(r)  =  <  , 

[  max{0,»jj_i  (m)  —  ra«tfo»j(0,»j.5_i(M))}  if  u  inhibits  v 

where  random{p,q)  is  a  random  integer  drawn  from  a  uniform 
distribution  over  the  range  [p,q]. 

If  we  employ  the  policy  of  token  passing  with  consumption,  then 
after  m,(r)  has  been  computed  based  on  the  formula  above,  m,{u)  is 
updated  as: 

mfu)  =ms-\(u)  —  Tmn{ms-]{u),\ms{v)  —  (r)|}. 


Signaling  Network  Event  Generator 

The  SPN  topology  and  transition  token-number  selection  policy 
alone  do  not  specify  the  speed  with  which  individual  signaling 
interactions  occur.  However,  such  rates  must  be  accounted  for 
when  simulating  a  signaling  network.  ODEs  characteristically 
model  such  details  as  reaction  rate  constants;  parameterized  Petri 
nets  specify  these  in  a  variety  of  ways  including  transition  firing 
rates  and  firing  probabilities  [17,30].  In  synchronized  Petri  nets, 
the  environment  controls  the  generation  of  events.  Thus,  the 
signaling  network  event  generator  is  responsible  for  controlling  the 
timing  and  ordering  of  signaling  events.  However,  as  our  objective 
is  a  non-parametric  simulation  method,  our  approach  must  either 
estimate  these  parameters  or  operate  without  explicit  knowledge  of 
them. 

Estimating  reaction  rates  using  only  connectivity  is  currently 
beyond  the  predictive  or  inferential  capabilities  of  computers. 
While  there  has  been  some  work  in  the  area  of  predicting  reaction 
rates,  all  results  of  which  we  are  aware  require  knowledge  about 
the  mechanism  of  signaling  (e.g.,  [56]).  As  a  result,  without 
enriching  the  SPN  model,  it  is  doubtful  that  rate  parameters  can 
be  accurately  estimated. 

For  this  reason,  the  signaling  network  event  generator  operates 
without  explicit  knowledge  of  the  rate  parameters.  To  compensate 
for  this  “missing”  knowledge,  we  make  use  of  an  observation  of 
signaling  networks  discussed  earlier:  a  network’s  connectivity 
determines  its  dynamics.  Several  studies  have  found  that  the 
connectivity  of  biochemical  networks  desensitizes  them  to  small 
fluctuations  in  the  kinetic  biochemical  parameters  [45-47]. 
Understood  within  the  context  of  evolution  -  a  stochastic  process 
that  tweaks  signaling  network  parameters  across  generations  -  this 
is  a  highly  desirable  property  as  it  ensures  that  an  offspring 
remains  viable  despite  fluctuations  in  the  exact  tuning  of  its  cellular 
machinery.  If  this  property  holds,  then  small  fluctuations  in  the 
rate  parameters  should  have  a  marginal  effect  on  the  overall 
propagation  of  signal  through  the  network.  We  can  consider  these 
small  effects  to  be  noise  obscuring  the  underlying  dynamics  of  the 
network  connectivity.  By  taking  many  samples  of  the  network 
dynamics  under  a  variety  of  reaction  rate  assignments  and  then 
averaging  these  dynamics,  we  simultaneously  reduce  the  noise 
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introduced  by  any  one  rate  assignment  and  strengthen  the 
underlying  dynamic  characteristics  of  the  network’s  connectivity. 

However,  since  reaction  rate  constants  can  vary  by  several 
orders  of  magnitude — from  10  to  10^,  the  task  of  correcdy 
selecting  parameters  close  to  the  true  parameters  is  non-trivial.  In 
fact,  without  having  some  estimate  of  the  actual  rate  parameters,  it 
is  unclear  as  to  how  to  measure  closeness  at  all.  Clearly,  these  are 
among  the  issues  that  make  parameter  estimation  so  difficult  for 
ODE  and  Petri  net  approaches.  Since  our  comparisons  will  be 
relative  and  not  absolute,  we  take  a  relative  approach  to  modeling 
rate  parameters.  The  space  of  possible  rate  values  is  the  space  of 
possible  signaling  event  orderings. 

This  idea  is  illustrated  in  Figure  3A.  Protein  A  affects  the  activity 
of  protein  D  through  two  separate  pathways.  Assuming  that  A  is 
active  to  begin  with,  the  relative  speed  of  these  two  pathways 
determines  the  final  activity  of  D.  If  the  pathway  through  G  is 
faster  than  the  pathway  BHD,  then  D  will  be  active.  However,  if 
the  pathway  speeds  are  reversed,  then  D  will  remain  inactive.  The 
overall  outcome  of  this  network  can  be  represented  without  any 
use  of  numeric  reaction  rates  by  representing  the  reaction  rates  as 
an  ordering  over  all  the  edges  in  the  network.  We  can  extend  this 
idea  to  the  SPN  by  observing  that  there  exists  a  unique  event  for 
each  signaling  edge  in  the  signaling  network. 

This  sampling  strategy  is  the  motivation  for  the  dual-time 
framework  depicted  in  Figure  3B  and  implemented  by  the 
signaling  network  event  generator  shown  in  Figure  6.  Time  blocks 
are  the  larger  time  intervals  during  which  every  signaling  event 
occurs  exactly  once.  Since  every  transition  in  the  SPN  is  associated 
with  a  unique  event,  each  transition  will  fire  exactly  once  in  each 
time  block.  Transition  firings  are  the  smaller  time  units  that  impose  a 
strict  sequential  order  on  the  occurrence  of  signaling  events.  While 
this  strict  sequentiality  of  firing  models  relative  reaction  rates,  it 
also  discretizes  the  effect  of  signaling  events.  Though  this  is 
consistent  with  the  definition  of  transition  firing  in  discrete  time 
Petri  nets  (only  one  transition  is  evaluated  at  a  given  point  in  time) 
[44],  in  biological  signaling  networks  there  is  no  such  serial 
evaluation  constraint.  However,  our  validation  with  experimental 
data  suggests  that  this  discretization  approximation  does  not  affect 
the  overall  validity  of  the  simulation  results. 

Procedure  GenerateSignalingEvents(£,  n) 

k  =  |I5| 

2.  (7  an  an  empty  array  of  size  {k  x  n) 

3.  i  =  1 

4.  for  6  =  1  to  n 

(a)  E’  =  E 

(b)  while  E'  ^9 

i.  e  =  a  random  event  from  E' 

ii.  (7[i)  =  e 

iii.  E'  =  E'-{e} 

iv.  i  =  i  -I- 1 

5.  Return  cr 


Figure  6.  The  Algorithm  That  Implements  the  Signaling 
Network  Event  Generator.  This  routine  generates  the  time  block/ 
firing  structure.  Given  a  set  of  events,  E,  and  the  number  of  blocks  for 
which  the  SPN  will  be  executed,  n,  GENERATESicNAUNGEvENts  generates  n 
blocks  of  events,  each  consisting  of  |E|  events  ordered  randomly.  In 
each  block,  every  event  in  E  occurs  exactly  once. 
doi:1 0.1 371/journal.pcbi.1 000005.g006 


Defining  the  Initial  State 

As  mentioned  previously,  the  initial  state  of  the  SPN  is  the  initial 
marking,  mo.  As  the  SPN  provides  no  explicit  information  on  how 
this  marking  should  be  budt,  we  propose  three  ways  to  eonstruct 
the  initial  state:  zero,  basal,  or  experimentally  derived.  In  a  zero 
initial  state,  the  simulator  initializes  all  proteins  to  have  zero 
tokens.  The  basal  initial  state  is  a  random  distribution  of  aetivation 
levels  intended  to  model  the  eell  when  no  impulses  due  direetly  to 
external  stimuli  are  propagating  through  the  signaling  network. 
Though  a  basal  network  is  eonsidered  at  rest,  in  general  it  will  not 
have  a  zero  marking  since  signal  flows  are  known  to  oecur  even  in 
unstimulated  signaling  networks  through  autocrine  and  paracrine 
seeretions  by  the  eells.  The  experimentally  derived  initial  state  is 
based  on  knowledge  about  the  activity  levels  of  various  proteins 
just  prior  to  the  addition  of  the  external  stimuli. 

When  accurate  experimental  data  is  available  such  as  results  from 
microarrays  or  western  blots,  the  experimentally  derived  initial  state 
may  be  the  most  aeeurate.  A  ehallenge  in  using  experimental  data, 
however,  is  determining  how  best  to  assign  numbers  of  tokens  based 
on  the  experimentally  observed  activity  levels. 

In  the  absence  of  reliable  experimental  data,  the  basal  initial 
state  seems  more  accurate  than  the  zero  initial  state.  However,  it 
presents  the  challenge  of  properly  seleeting  the  basal  aetivity-levels 
to  assign  to  eaeh  protein  in  the  model  network.  In  [18],  a  basal 
initial  state  was  eonstructed  by  activating  a  small  number  of 
randomly  selected  proteins  in  the  signaling  network.  However,  the 
work  in  [18]  was  done  using  a  boolean  model.  Translating  this 
approach  into  a  tokenized  model  creates  the  additional  eomplexity 
of  determining  how  many  tokens  eaeh  basally  active  protein 
should  receive.  The  correct  values  are  likely  to  depend  on  the 
specific  signaling  network  and  experimental  eonditions. 

We  performed  preliminary  tests  to  eompare  the  effect  of  using 
different  basal  versus  zero  markings  on  the  outeome  of  the 
simulator.  We  found  that  the  basal  and  zero  states  produced 
indistinguishable  predictions  so  long  as  less  than  30%  of  the 
proteins  were  aetivated  and  a  small  number  of  tokens  (<5)  were 
used  when  construeting  the  basal  marking.  This  is  not  as  surprising 
as  it  may  seem  at  first.  Inhibitory  edges  will  quickly  consume  a 
small  number  of  tokens  scattered  throughout  the  network, 
effectively  returning  much  of  the  network  to  the  zero  state  before 
a  stimulation  event  can  propagate  through. 

Furthermore,  while  validating  our  method,  we  also  compared 
the  predictions  produced  by  SPNs  based  on  a  zero  initial  state  and 
experimentally  derived  initial  state.  These,  too,  did  not  produce 
noticeably  different  final  results  for  similar  reasons  as  discussed 
above.  Details  of  these  comparisons  will  be  discussed  further  in  the 
Results  and  Discussion  sections. 

However,  since  all  three  initial  state  construction  strategies  yield 
qualitatively  identical  predictions,  using  zero  initial  states  has  the 
advantage  of  invoking  the  fewest  unnecessary  assumptions  about 
the  network  (as  in  the  case  of  the  basal  initial  state)  and  requiring 
the  least  experimental  data  (as  in  the  case  of  the  experimentally 
derived  state).  Nonetheless,  in  our  implementation  of  the  tool,  we 
allow  for  using  any  one  of  these  three  initial  state  construction 
strategies. 

Modeling  Cell-Specific  Signaling  Networks 

Whereas  consensus  signaling  networks  typically  represent  the 
connectivity  in  normal  cells,  many  experiments  are  conducted  on 
abnormal  cells  in  which  oncogenic  mutations,  gene  knockouts,  and 
pharmacological  inhibitors  have  altered  the  behavior  of  various 
signaling  nodes  in  the  network.  In  an  SPN,  these  alterations  to  the 
signaling  network  can  be  modeled  by  adding/ removing  transitions 
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(and  associated  iiiput/output  arcs)  and  explicitly  setting  the  token 
count  for  various  proteins  in  the  initial  state. 

The  two  network  alterations  which  are  commonly  induced  by 
oncogenic  mutations,  gene  knockouts,  or  pharmacological  inhib¬ 
itors  are  constitutively  high  or  low  protein  activity-levels,  meaning 
that  a  protein  is  either  unable  to  be  inhibited  or  unable  to  be 
activated.  The  simulator  allows  for  proteins  to  be  specified  as 
either  fixed  High  or  Low.  Here  we  explain  how  these  are  modeled 
by  changes  to  the  SPN. 

If  protein  u  is  fixed  high,  then  this  protein  cannot  be  inhibited. 
Thus,  all  transitions  that  remove  tokens  from  p„  are  removed  from 
the  SPN.  The  fact  that  u  is  high,  however,  also  suggests  that  it 
maintains  a  higher  activity  level  in  general.  Therefore,  in  the  initial 
state,  =  H,  where  H  is  a  non-zero  number  of  tokens.  Since 

all  inhibiting  transitions  have  been  removed  from  the  SPN, 
throughout  any  execution,  place  p^  will  always  have  at  least  H 
tokens. 

In  experiments,  we  have  observed  that  the  choice  of  the  value  of 
H  does  not  change  the  relative  outcome  of  the  simulations.  While 
H  will  affect  the  actual  number  of  tokens  present  in  a  given  place 
as  well  as  the  number  of  time  blocks  required  to  observe  certain 
activity-level  changes,  the  relative  changes  in  activity-level 
(number  of  tokens)  among  different  proteins  (places)  does  not 
change.  As  a  result,  one  is  free  to  select  any  reasonable  value  of  H 
(for  our  experiments,  we  used  H  =  1 0)  as  long  as  this  H  is  held 
constant  across  all  simulations  whose  results  will  be  compared. 

If  protein  u  is  fixed  low,  then  this  protein  cannot  be  activated. 
Thus,  all  transitions  that  add  tokens  to  Pu  are  removed  from  the 
SPN.  The  fact  that  u  is  low,  however,  also  suggests  that  it 
maintains  a  constantly  low  activity  level  in  general.  Therefore,  in 
the  initial  state,  m()(/)„)  =  L,  where  L  is  a  small  number  of  tokens  (in 
our  simulations  we  use  L  =  0).  Since  p^  is  only  inhibited,  we 
observed  that  all  constitutively  low  proteins  quickly  had  their 
marking  reduced  to  zero. 

Unlike  the  value  of  H,  extra  caution  must  be  taken  when 
selecting  values  for  representing  L.  A  value  of  L  that  is  too  large 
can  destabilize  the  early  propagation  of  signal  through  the 
network.  In  our  experiments,  we  obtained  best  results  for  values 
of  L  very  close  to  or  equal  to  zero  (T^2).  Beyond  this,  the  final 
results  obtained  depended  on  other  values  in  the  network,  the 
strength  of  the  signal,  and  the  duration  of  the  simulation. 

Simulating  a  Signaling  Network 

Figure  7  provides  more  detailed  versions  of  the  simulation 
algorithm  outlined  in  Figure  2.  Steps  1  and  2  of  the  Simulate 
procedure  constructs  the  initial  marking  and  net  topology  to 
incorporate  perpetually  high  proteins,  H,  and  perpetually  low 
proteins,  L.  In  this  paper,  proteins  that  are  assigned  high  activity- 
levels  receive  an  initial  token  count  of  10  in  order  to  model  a 
higher-than-average  initial  activity-level.  As  discussed  earlier, 
using  other  values  of  H  scale  the  activity-levels  of  all  the  proteins 
in  the  network,  but  will  not  qualitatively  change  their  relative 
activities. 

The  loop  in  Step  3  runs  r  individual  simulation  runs.  Each  run 
receives  a  different  event  ordering,  a‘\  thereby  implementing  the 
interaction  rate  sampling  strategy.  The  time  block/ step  structure  is 
contained  within  the  ordering  ct''  (see  Figure  6C).  As  a  result,  the 
SPN  execution  step  simulates  the  events  by  firing  their  associated 
transition.  Only  those  markings  that  correspond  to  time  block 
boundaries  are  sampled. 

After  Simulate  finishes  collecting  the  time  block  markings  from 
all  the  runs.  Step  4  computes  the  average  markings  for  each  time 
block  and  Step  5  returns  these  averages. 
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Procedure  Simulate(5 

H,  L,  B,  r) 

1. 

For  each  p  6  // 

•  tno(p)  =  10; 

1 

11 

• 

t  &  T  and  (t,p)  ^  O] 

2. 

For  each  p  £  L 

•  tno(p)  =  0; 

•  ^  =  -^^-  {(«.P) 

ter}; 

3. 

for  i  =  1  to  r 

•  ff'  =  GenerateSignalingEvcnts(£l,  B); 

•  Execute  mglcr} 

4. 

For  each  p  &  P  and 

[)<6<B 

•  mtCp)  =  ?  EI'= 

5. 

Return  (mi,  m2, . . . 

mfl) 

Figure  7.  The  Procedure  for  Simulating  a  Signaiing  Petri  Net. 

Simulate  predicts  the  signal  flow  through  the  SPN  S.  The  simulation  is 
run  for  B  time  blocks;  the  results  of  r  runs  are  averaged  to  produce  the 
final  result.  Most  of  the  work  is  done  by  the  signaling  Petri  net 
execution  procedure  detailed  in  the  preceding  sections.  This  execution 
actually  performs  an  individual  run.  This  procedure  takes  the  initial 
marking,  mo,  and  applies  the  sequence  of  transitions  triggered  by  the 
event  sequence,  a''.  This  ordering,  generated  by  the  algorithm  in 
Figure  6,  has  the  dual  time  structure  in  which  each  block  of  edges 
contains  every  event  in  E  exactly  once.  Each  firing  evaluates  the  effect 
of  one  transition.  The  markings  at  the  end  of  each  time  block  are 
extracted  in  Step  5. 
doi:10.1371/journal.pcbi.1000005.g007 

Simulating  a  Perturbation  Experiment 

We  tested  the  accuracy  and  performance  of  our  method  by 
simulating  the  effect  of  two  different  targeted  manipulations  to  a 
well-known  signaling  network.  We  compared  these  predictions  to 
experimental  results  produced  by  performing  the  actual  manip¬ 
ulations  on  two  separate  cancer  cell  lines. 

The  perturbations  we  considered  in  this  study  altered  the 
constitutive  activity-level  of  various  proteins  in  the  network  (as 
opposed  to  affecting  specific  signaling  interactions).  Therefore,  we 
modeled  the  perturbations  as  changes  in  the  high  and  low 
proteins — and  for  the  control  (unperturbed)  network  and  H’’ 
and  L*’  for  the  perturbed  network. 

A  variant  of  the  Simulate  method  was  required  to  quantify  how 
a  perturbation  changed  the  protein  token-counts  for  each  time 
block.  Figure  8  shows  the  algorithm  we  used.  In  the  procedure 
DieeerentialSimulate,  the  input  S  provides  the  consensus  SPN. 
Inputs  and  L'^  specify  the  control  high  and  low  proteins,  the 
inputs  H’’  and  L’’  specify  the  perturbed  high  and  low  proteins. 
After  Steps  1-5  construct  two  separate  SPNs  for  the  control  and 
perturbed  conditions,  the  loop  in  Step  6  performs  r  independent 
simulations  over  the  control  and  perturbed  models.  Step  6d 
computes  the  difference  between  the  markings  at  the  end  of  each 
time  block  in  the  perturbed  and  control  networks.  The  marking 
difference  rfj  =  tJiy  —  /nj  yields  the  marking  rfj  where  d‘i  ( v)  = 
n^{v)—m‘i[v)  for  each  veP.  Following  the  loop,  the  marking 
differences  are  averaged  to  obtain  the  time  series 
where  ^4(11)  is  the  average  change  in  the  token-count  for  protein  v 
at  time  block  b. 

For  values  of  |  4 1  >0  for  a  given  molecule  v,  we  can  conclude 
that  the  perturbation  caused  a  change  in  the  activity-level  of  v  at 
time  block  b  only  if  the  difference  observed  is  statistically 
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Procedure  DifferentialSimulate(S,  H 

H'.  Lf.  B.  r) 

\. 

S'  =  S,  SF  =  S; 

2. 

For  each  p  € 

(a)  m5(p)  —  10  and  =s  - 

-{(p.O 

.  (  e  IT  and  ((,p)  ^  O'} 

3. 

For  each  pG 

(a)  mo(p)  =  0  and  - 

{(t.p)  : 

<eT'}; 

4. 

For  each  v  € 

(a)  mo(t;)  =  10  and  =  /** 

:  (  6  TF  and  (t,  ti)  jT  O'} 

5. 

For  each  v  ^  U* 

(a)  mo(u)  —  0  and  /*’  =  /*’  — 

teTF}; 

6. 

for  i  =  1  to  r 

(a)  (T*  =  GcncratcSignalini?EvcnU(£ 

T); 

(b)  Execute 

(c)  Execute  niSI<r)mF 

(d)  For  j  =  0  lo  B 

i.  d)  =  mJin  -  ra^iri 

7. 

For  each  p €  P  and  0  <  b<  B 

(a)  A»(p)  =  li::.,<f,(p); 

8. 

Return  (Ai,A2,...,Ab); 

Figure  8.  The  Algorithm  for  Predicting  the  Effect  on  Signal 
Propagation  of  a  Targeted  Manipulation.  The  algorithm  for 
predicting  the  effect  on  signal  propagation  of  a  targeted  manipulation 
on  signaling  network  with  connectivity  G.  The  'c'  and  'p'  superscripts 
are  used  to  denote  parameters  in  the  control  and  perturbed  versions, 
respectively,  of  the  SPN. 
doi:10.1371/joumal.pcbi.1000005.g008 


significant.  We  use  a  t-test  to  determine  whether  this  change  is 
statistically  significant  for  protein  v  at  time  block  b.  Computing  the 
t-test  for  two  distributions  (control  and  perturbation)  requires 
knowledge  of  the  mean  and  Pp^t)  as  well  as  the  variance 
and  rr^^  for  both  distributions.  In  order  to  obtain  these 
parameters  for  the  control  network,  a  large  number,  X,  of 
independent  simulations  is  run.  Simulation  i  provides  a  single 
series  of  markings,  The  mean  is  then  computed: 

E  «A(v) 

_  '=1 

f'c.A.v  —  ^ 

The  variance  is  computed  similarly: 

2  _  i=l  ^ _ _ 

The  parameters  Pp  i,  „  and  „  for  the  perturbed  network  are 
computed  as  described  above  by  substituting  the  perturbed  network 
for  the  control  network.  Using  tliese  parameters,  the  t-value  for 
molecule  v  at  time  block  b  can  be  computed  from  the  formula 


t  —  value  = 


l^c,h,v 


The  statistical  significance  of  the  differenee  can  then  be  obtained  by 
comparing  the  t-value  to  the  desired  critical  value. 

Note  that  the  DifferentialSimulate  proeedure  and  the 
associated  significance  test  can  predict  the  effect  not  only  of 


perturbations,  but  also  of  any  two  different  experimental  (or 
cellular)  conditions  imposed  on  the  same  signaling  network.  As 
a  result,  in  addition  to  perturbation  experiments,  our  method 
can  also  be  used  to  study  the  effects  of  other  phenomena  that 
induce  changes  in  the  propagation  of  signal  through  a  signaling 
network. 

Cell-Specific  Signaling  Network  Models 

Figure  1  shows  the  signaling  network  we  analyzed.  We  obtained 
the  core  connectivity  from  a  published  literature  survey  on  the 
EGFR  network  [57].  We  added  to  this  several  other  well- 
established  interactions  taken  from  literature  [36-43].  The 
response  of  this  network  to  various  perturbations  was  measured 
and  simulated  in  two  separate  breast  cancer  cell  lines:  MDA231 
and  BT549.  The  core  signaling  Petri  net  used,  is  captured 

by  the  following  signaling  proteins  and  interactions:  places  (the  set 
P):  VegFR,  Vsrc,  VRac,  VMEKK4,  VmEK4,  Vjnk,  VmEKK6,  VmEKB; 
VSTAT,  VGrbZj  Vshc;  VsoS,  Vrb,  VelK,  VbaD,  VnfKB,  Vras,  VgaBI, 
Vpip:s,  VpisK,  VpDKl,  VpTEN,  Ve-Raf,  VaktALKBI,  VmEK,  VgsKS/!; 
VaMPK:  VtsC2j  VmaPK1,2)  Vrsk,  VrecB,  Va^xOR-Raptorj  Webpi, 
Vp70S6K,  Vp38,  and  Vpse- 

Protein  interaction  network  motifs  (the  combination  of  arcs  and 
transitions):  VEGFR“^VGrb2)  VGrb2^vshc)  vshc^vsos,  vsos^vras, 
VGrb2^VGABl,  VgaBI^Vpisk,  VegFR“^Vsrg,  Vsrc^VstAT, 
Vpi:sK“^VpiP;t;  VpiP,S^VpDKl)  VRAS^V,,.Raf,  Vpe)K1^VakT)  Vras^ 
VRac)  VRac^VMEKK4,  VmEKK4“^VmEK4,  VmEK4^Vjnk,  VjnK"^ 
VSTAT,  VRac^VMEKKB)  VmEKK6^VmEK6,  VMEKB^VpSs,  Vpas"^ 
VSTAT,  VpDKl^Vp70S6K,  VpTEN^VAKT,  VAKT^Vc-Raf,  VAKT^  VGSK:t/), 
VAKT^VTSC2,  VaK'I^VaMPK,  VtvK'iHvbAD,  VaK'I-^VnfKB,  VakT^ 
Vp70SBK,  VlkBI^VamPK,  VmEK^VmaPK1,2)  VmAPK1,2^Vrb, 
VmAPKI,2^VelK,  VmAPK1,2^VstAT,  VGSK:t/r^VTSC2j  VamPK"^ 
VTSC2)  VMAPK1,2^VEGFR,  VMAPK1,2^VTSC2;  VMAPKl,2^Vp70SBK, 
VmAPK1,2^Vrsk,  VRSK^VTSC^,  VTSCt2^ VRhc4);  VRbeb^VjuTOR-Raptor, 
^AKT  ^^rnTOR-Raptor?  En^’^poR-Raptor  ^^4EBPl3  Vm'pOR-Raptor  ^ 
Vp70S6K,  Vp70S6K^VEGFR,  VsRG^VsRC,  VRacHvRac,  VMEKK4^ VMEKK4, 
VMEK4^VMEK4,  VJNK^VJNK,  VMEKK6^  VMEKK6,  VMEK6^  VmEKB; 
VSTAT^VSTAT)  VGrb2^VG^b2)  VsEc^Vshc>  Vsos^VsoS)  VRAS^VRAS; 
Vc-RaHVc-Raf,  VMEK^VMEK,  VMAPK1,2^VMAPK1,2)  VrrHvrb,  VELK^ 
VeLK)  VRSK^VRSK)  VGABI^VGAB1)  VpIP:^^ Vpip:n  Vp38-|Vp38,  VpI3K^ 
Vpi:3K,  VpdkHvpdKI,  VAKT^VAKT,  VBAD^VBAD,  VNFKB^  VNFKB, 
VAMPK^V7tMPK,  V,„TOR-Raptor^VmTOR-Rapto^,  Vp7oS6K^  Vp70SBK, 
VpS6^VpS6>  V4EBP1^V4EBP1• 

Notice  that  the  last  several  edges  are  self-inhibitory  loops  (e.g., 
VRas^VRas)•  These  loops  are  used  to  model  regulatory  mechanisms 
that  are  not  present  in  the  model  network. 

For  molecules  that  do  not  have  specific  inhibitory  edges 
modeled  in  the  network,  we  use  the  self-inhibitory  loop  to  prevent 
exponential  increase  in  the  token  counts  and  to  model  inhibitory 
mechanisms  beyond  the  scope  of  the  network.  For  example, 
consider  the  molecule  Ras  in  the  network  shown  in  Figure  1.  In 
the  model,  this  protein  is  not  inhibited.  However,  biologically  we 
know  that  Ras  has  intrinsic  GTPase  function  which  inactivate 
itself  In  order  to  model  this,  we  introduce  a  self-inhibitory  loop. 

The  differences  between  the  two  cell-specific  networks  are 
captured  by  following  activity  assignments  to  various  proteins  in 
the  SPN.  In  the  MDA231  cell  line,  H“®={vr„„  vegf}  and 
L^®  =  0.  In  the  BT549  cell  line,  H®'^={vegf}  and 
L  =  {vpten}- 

Of  the  two  perturbations  we  considered,  one  significandy 
knoeked  down  the  activity-level  of  TSC2  and  the  other  knocked 
down  mTOR-Raptor.  Wilde  the  core  SPN  still  modeled  these 
networks,  separate  perturbed  activity-assignments  were  required  for 
eaeh  eell  line-perturbation  pairing:  =  L'^®U{vtsc2}, 
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T  MB-mTOR  _  T  MB,  ,f  ^  j  BT-TSC2  _  y  BT,  ,(  >  j 

L  -  L  U  {VmTOR-Raptor},  ^  -  L  U  {VxSCs}  ^nd 

T  B'I'-mTOR  _  T  BT|  I  f  ^ 

^  '-^jVmTOR-Raptor/- 

Setup  for  Perturbation  Experiments 

Cell  culture  and  stimulation.  Human  MDA-MB-231 
(MDA231)  and  BT549  breast  cancer  cells  were  routinely 
maintained  in  RPMI  supplemented  with  10%  FBS.  For  signaling 
experiments,  logarithmically  growing  cells  were  serum-starved  for 
16  hours  and  then  subjected  to  treatments  by  epidermal  growth 
factor  (EGF)  (20  ng/mL)  (Cell  Signaling  Technology,  Beverly, 
Massachusetts)  for  30  minutes.  Controls  were  incubated  for 
corresponding  times  with  DMSO.  To  knock  down  TSC2,  cells 
were  treated  with  short  interfering  RNA  (siRNA)  (Dharmaeon, 
Lafayette,  Colorado)  for  72  hours  prior  to  EGF  stimulation.  Control 
cells  were  transfected  with  non- targeting  (N/T)  siRNA  (Dharmaeon, 
Lafayette,  Colorado)  prior  to  EGF  treatment. 

Antibodies.  The  following  antibodies  were  used  for 
immunoblotting:  anti-phospho-p44/ 42  MAPK,  anti-phospho- 
GSK3/?  (S21/S9);  anti-phospho-AKT(ser473);  anti-phospho- 
TSC2(T1462);  anti-phospho-mTOR(S2448);  anti-phospho- 
P70S6K(T389)  (Cell  Signaling  Technology,  Boston, 
Massachusetts);  and  anti-/?-Actin  (Sigma-Aldrich,  St.  Louis, 
Missouri). 

SDS-PAGE  and  immunoblotting.  Cells  were  lysed  by 
incubation  on  ice  for  15  minutes  in  a  sample  lysis  buffer 
(50  mM  Hepes,  150  mM  NaCl,  1  mM  EGTA,  10  mM  Sodium 
Pyrophosphate,  pH  7.4,  100  nM  NaF,  1.5  mM  MgC12,  10% 
glycerol,  1%  Triton  X-100  plus  protease  inhibitors;  aprotinin, 
bestatin,  leupeptin,  E-64,  and  pepstatin  A).  Cell  lysates  were 
centrifuged  at  15,000  g  for  20  minutes  at  4°C.  The  supernatant 
was  frozen  and  stored  at  —  20°C.  Protein  concentrations  were 
determined  using  a  protein-assay  system  (BCA,  Bio-Rad, 
Hercules,  California),  with  BSA  as  a  standard.  For  immuno¬ 
blotting,  proteins  (25  |a,g)  were  separated  by  SDS-PAGE  and 
transferred  to  Hybond-C  membrane  (GE  Healthcare,  Piscataway, 
New  Jersey).  Blots  were  blocked  for  60  minutes  and  incubated 
with  primary  antibodies  overnight,  followed  by  goat  anti-mouse 
IgG-HRP  (1:30,000;  Cell  Signaling  Technology,  Boston, 
Massachusetts)  or  goat  anti-rabbit  IgG-HRP  (1:10,000;  Cell 
Signaling  Technology)  for  1  hour.  Secondary  antibodies  were 
detected  by  enhanced  chemiluminescence  (ECL)  reagent  (GE 
Healthcare,  Piscataway,  New  Jersey).  All  experiments  were 
repeated  a  minimum  of  three  independent  times. 

Setup  for  perturbation  simulations.  To  select  the  block 
duration  parameter,  B,  we  compared  the  experimentally  derived  fold 
change  of  AKT  in  the  MDA23 1  cell  line  to  the  AKT  fold  changes 
predicted  for  B  =  10,  20,  50,  100,  and  1000.  We  found  B  =  20  to  be 
the  best  fit  and  used  this  value  for  all  simulations  in  this  study. 

We  also  experimented  with  input  parameter  r,  the  numbers  of 
individual  simulation  runs  averaged  per  simulation.  We  tried  a 
range  extending  from  /■=  100  to  r=  1000.  We  found  that  no 
observable  changes  occurred  in  trends  for  )&400.  Therefore, 
r  =  400  was  used  for  all  simulations  in  this  study. 

We  considered  both  the  zero  and  experimentally  derived  initial 
states  as  the  initial  markings  for  the  TSC  inhibition  simulations. 
The  experimental  states  for  both  cell  lines  were  derived  from 
western  blots  produced  from  cells  that  were  incubated  in  DMSO 
and  serum-starved  for  16  hours.  Unsampled  molecules  were 
assigned  a  marking  of  zero.  The  number  of  tokens  assigned  to 
each  sampled  molecule  was  directly  proportional  to  the  darkness 
of  the  line  on  the  western  blot.  This  assignment  was  done  by  hand, 
though  devising  automated  and  standardized  methods  for  the 
construction  of  experimentally  derived  initial  states  is  an  important 
direction  for  future  work.  Since  most  of  the  molecules  in  the 
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network  were  not  sampled,  only  mTOR-Raptor,  TSC2,  GSK3/1, 
p70S6K,  AKT,  and  MAPK  were  given  non-zero  markings.  The 
initial  markings  used  are  shown  in  Table  1. 

Since  experimental  results  for  the  mTOR-Raptor  inhibition 
were  obtained  from  literature,  we  did  not  have  experimental 
results  for  construction  of  experimentally  derived  initial  states. 
Therefore,  we  used  the  zero  initial  states  for  the  mTOR-Raptor 
inhibition  simulations. 

Results 

In  order  to  evaluate  the  accuracy  of  our  simulation  method,  we 
tested  its  predictions  of  the  effect  of  targeted  manipulations  on  two 
cell-specific  versions  of  the  signaling  network  depicted  in  Figure  1 . 
In  each  cell  line,  a  TSC2-specific  siRNA  was  applied  and  the 
concentration  of  several  key  proteins  in  the  EGFR  network  were 
sampled  30  minutes  after  stimulation  with  EGF.  This  was 
repeated  in  the  absence  of  the  TSC  2  siRNA  in  order  to  obtain 
the  concentration  in  the  control  network.  We  also  collected  a 
corpus  of  literature  detailing  the  response  of  signaling  proteins 
activity-levels  to  the  inhibition  of  mTOR-Raptor  using  Rapamya- 
cin  [43,58].  Predictions  were  generated  by  our  simulator  for  the 
TSC2  and  mTOR-Raptor  perturbations  in  both  cell  lines. 

Simulation 

To  simulate  a  perturbation,  we  used  two  networks  both  based 
on  the  signaling  network  shown  in  Figure  1:  the  control  network 
for  the  cell  line  and  the  perturbed  network  for  the  cell  line.  The 
control  networks  for  the  cell  lines  were  different  because  it  was 
important  to  model  the  cell-specific  mutations.  In  the  case  of  the 
BT549  cell  line,  there  is  a  mutation  that  leads  to  the  loss  of  PTEN, 
which  makes  AKT  always  active.  In  the  MDA23 1  cell  line,  there  is 
a  mutation  in  Ras,  which  makes  it  always  active.  As  shown  in  the 
formulation  of  the  model,  these  are  modeled  using  fixed  activity 
assignments  in  the  simulator. 

The  TSC2  (mTOR-Raptor)  perturbed  network  for  a  cell  line 
was  created  by  taking  the  control  network  and  fixing  the  activity- 
level  of  TSC2  (mTOR-Raptor)  to  zero  for  the  duration  of  the 
simulation,  effectively  simulating  the  pharmacological  inhibition  of 
the  protein.  For  each  cell-line/perturbation  pair,  we  ran  the 
simulator  on  the  control  and  perturbed  networks  using  the 
DifferentialSimulate  procedure  in  Figure  8  which  computed  the 
change  in  token-counts  induced  by  the  perturbation  for  all 
proteins  in  the  model.  These  change  plots  are  shown  in  Figure  9 
for  TSC2  and  in  Figure  10  for  mTOR-Raptor.  We  ran  the 
simulations  using  both  experimentally  derived  initial  states  as  well 

Table  1.  Experimentally  Derived  Initial  Markings  Used  in  the 

Simulations. 


Molecule 

MB231 

BT549 

Control 

TSC2 

Inhibited 

Control 

TSC2 

Inhibited 

mTOR-Raptor 

0 

1 

5 

5 

TSC2 

0 

0 

6 

0 

GSKB/j 

5 

3 

3 

6 

p70S6K 

0 

2 

0 

0 

AKT 

0 

0 

7 

7 

MAPK 

2 

6 

1 

2 
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as  zero  initial  states.  The  initial  state  used  did  not  change  the 
overall  trends  observed  in  the  simulations. 

Using  the  t-test  described  in  the  Methods  section,  we  also 
computed  the  statistical  significance  of  the  final  time  block  (b  =  20) 
for  each  molecule  considered.  For  each  molecule  considered,  400 
runs,  20  time  blocks,  and  50  samples  were  used.  With  the 
exception  of  GSK3/?  which  did  not  show  a  significant  response  to 
the  perturbation,  the  changes  of  all  other  proteins  sampled  were 
beyond  the  0.05  significance  level  (see  Table  2).  The  statistical 
insignificance  of  the  change  in  GSK3/J  is  not  surprising  since,  as 
shown  in  Figure  1,  GSK3/1  is  solely  activated  by  LKB,  a  molecule 
fixed  high  in  both  cell  lines.  Thus,  we  should  not  expect  either 
perturbation  to  have  a  significant  effect  on  the  activity  of  GSK3/1, 
which  is  what  the  t-value  indicates. 

Cell  Line  Western  Blot  Results 


Experimental  Results 

After  the  TSC2  perturbation  was  applied  to  a  ceU  line,  the 
protein  concentrations  were  collected  using  western  blots.  Details 
are  given  in  the  Materials  and  Methods  section.  The  western  blot 
results  are  shown  in  Figure  9. 

Discussion 

As  can  be  seen  in  Table  3,  our  method  correctly  predicted  the 
relative  protein  activity-level  changes  induced  by  the  TSC2 
perturbation  in  both  cell  lines,  for  most  molecules  sampled. 
Notice  that  no  change  (-)  was  reported  for  the  predicted  response  of 
MAPK  to  the  TSC2  perturbation  despite  the  fact  that  a  small 
change  did  occur  in  its  marking  during  the  simulation  (see  Figure  9) 

Simulation  Results 


MDA231 


®AKT 

(p)mTOR 

(p)mAPK1,2 

(p)GSK3b 

(p)p70S6K 

®TSC2 


12  3  4 


Titm  bbck  (b) 


BT549 


(p)akt 

(p)mTOR 
(p)mAPK1,2 
(p)GSK3b 
@p70S6K 
®TSC2 

12  3  4 


B 

B 

■ 

B 

■ 

■ 

B 

B 

B 

B 

B 

B 

B 

m 

Figure  9.  The  Results  of  the  TSC2  Perturbation  Experiments  and  Simulations.  In  the  western  blots,  columns  (or  lanes)  are  as  follows:  (1) 
non-targeting  (NT)  control  sIRNA,  (2)  NT  sIRNA-tEGF,  (3)  TSC2  sIRNA,  (4)  TSC2  sIRNA-tEGF.  The  effect  of  the  TSC2  sIRNA  on  a  given  molecule  can  be 
assessed  by  comparing  column  4  against  column  2.  For  each  molecule  In  the  western  blot,  there  Is  a  corresponding  simulation  curve  showing  the 
predicted  change  In  protein  activity  over  time.  For  the  purposes  of  this  analysis,  we  compared  the  concentration  change  after  20  time  steps  (the  left¬ 
most  data  points  In  the  plots)  for  each  molecule.  Each  simulation  point  corresponds  to  the  average  of  400  measurements  that  were  computed  using 
the  procedure  described  In  Figure  8.  Experimentally  derived  Initial  states  were  used  In  the  simulations.  The  results  of  both  the  experiments  and 
simulations  are  qualitatively  summarized  In  Table  3. 
dol:10.1371/journal.pcbl.1000005.g009 
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Figure  10.  The  Predicted  Response  of  the  Network  to  an  mTOR-Raptor  Perturbation.  The  predicted  response  of  the  network  to  a  mTOR- 
Raptor  perturbation  in  the  (A)  MDA231  and  (B)  BT549  ceii  iines.  Our  method  predicts  that  the  amount  of  availabie  AKT  increases  in  response  to  the 
perturbation,  which  is  in  agreement  with  resuits  pubiished  in  the  iiterature  [43,58].  Our  method  aiso  predicts  that  the  activity-ievei  of  p70S6K  in  the 
IVIDA231  ceii  iine  decreases  in  response  to  the  perturbation,  which  has  been  observed  experimentaiiy  [59],  Each  point  corresponds  to  the  average  of 
400  measurements  that  were  computed  using  the  procedure  described  in  Figure  8. 
doi:1 0.1 371  /journai.pcbi.1  OOOOOS.gOI  0 


and  the  t-value  for  the  change  is  significant  (see  Table  2).  At  first, 
interpreting  this  value  as  no  change  may  seem  misleading.  However, 
one  of  the  significant  challenges  in  experimental  perturbation 
experiments  is  separating  true  system  responses  from  the 
background  noise  created  by  experimental  variables  that  cannot 
be  precisely  controlled  (among  them  cell  population  sizes, 
variability  in  microarray  antibody  binding  effectiveness,  and 
limited  sensitivity  of  hardware  and  software  used  to  quantify 
experimental  results).  As  a  result,  a  common  practice  is  to  only 
consider  those  substantial  changes  that  are  well  beyond  the 
background  noise  level.  Our  interpretation  of  the  small  predicted 
change  in  MAPK  as  no  change  reflects  the  fact  that  such  small 
changes  would  not  be  detectable  in  microarray  or  western  blot 
results.  Thus,  though  such  a  small  fluctuation  might  have  occurred 
in  the  real  data,  it  would  not  have  been  detected  by  the  biologists 
and  most  likely  would  appear  in  the  experimental  data  to  have  not 
changed. 

Similar  reasoning  guided  our  decision  to  characterize  the 
simulation  (and  experimental)  results  as  either  up  ( 'f ),  down  ( J, ), 
or  no  change  (— )  in  general.  Since  the  amount  of  protein 

Table  2.  The  T-Values  for  the  Molecules  Sampled  in  the 

Microarray. 


Molecule 

t-Value  in  MDA231 

t-Value  in  BT549 

mTOR-Raptor 

41.72 

30.53 

TSC2 

21.65 

8.28 

GSK3(S 

0.42 

0.10 

p70S6K 

14.22 

5.83 

AKT 

6.60 

9.55 

MAPK 

16.35 

18.93 

The  critical  value  for  an  alpha  value  of  0.05  with  50  samples  is  2.0086.  Note  that 
the  t-values  for  all  molecules  except  for  GSK3/i  are  larger  than  this  value, 
confirming  that  these  changes  are  statistically  significantly, 
doirl  0.1 371/lournal.pcbi.1 000005.1002 
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registered  in  a  microarray  or  western  blot  is  not  always  a  reliable 
indicator  of  the  exact  amount  of  protein  (or  protein  form)  being 
measured,  biologists  are  often  reluctant  to  report  degrees  of 
increases  or  decreases — ^preferring  binary  observations  such  as  up 
or  down  which  are  less  subject  to  influence  by  extraneous 
experimental  conditions.  It  is  true  that  our  simulation  method 
produces  precisely  quantified  increases  or  decreases  which  can  be 
taken  to  indicate  degrees  of  change  in  response  to  perturbations. 
However,  as  experimental  techniques  cannot  reliably  measure 
degrees  of  increase  or  decrease,  we  judged  the  binary  (up  or  down) 
characterization  to  be  a  more  reliable  way  of  validating  our 
method.  Certainly,  our  method  provides  additional  information  of 

Table  3.  Summary  of  the  Effect  of  Perturbation  Reported  by 

Experimental  and  Simulated  Methods. 


Molecule 

MB231 

BT549 

Experiment 

Simulation 

Experiment 

Simulation 

mTOR-Raptor 

T 

T 

t  or  - 

T 

TSC2 

1 

i 

i 

1 

GSK3/i 

- 

- 

- 

- 

p70S6K 

T 

T 

i 

T 

AKT 

i  or  — 

i 

1 

1 

MAPK 

- 

- 

- 

- 

The  up  arrow  ( I )  indicates  that  the  perturbation  caused  a  rise  in  the  level  of 
the  phosphorylated  protein;  the  straight  line  (— )  indicates  no  change;  and  the 
down  arrow  ( i )  indicates  that  a  decrease  occurred.  Values  in  the  Experiment 
column  were  estimated  by  comparing  lanes  4  and  2  in  Figure  9.  We  estimated 
the  Simulation  column  by  determining  whether  the  top  quartiie  of  the 
distribution  for  the  final  time  point  was  above,  below,  or  at  zero.  In  some  cases 
it  is  difficult  to  judge  for  certain  whether  the  total  quantity  of  the 
phosphorylated  protein  changed  or  remained  the  same — both  for  the 
experimental  and  computational  cases.  In  these  situations,  we  indicated  the 
uncertainty  by  listing  the  possible  changes  that  the  protein  could  have  feasibly 
undergone. 
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degrees  of  change  and  we  consider  studying  the  accuracy  of  these 
degrees  to  be  an  important  area  for  future  work. 

Our  method  also  correctly  predicted  the  activity-level  change  of 
AKT  in  response  to  mTOR-Raptor  inhibition  as  reported  by  a 
number  of  studies  [43,58].  Further,  our  method  predicted  that, 
when  mTOR-Raptor  is  inhibited,  the  level  of  p70S6K  in  the 
MDA231  cell  line  decreased,  which  also  had  been  observed 
experimentally  [59]. 

The  only  incorrect  prediction  made  by  our  method  was  the 
activity-level  change  of  p70S6K  in  the  BT549  cell  line.  However, 
BT549  cells  contain  an  RB  mutation  [49]  which  could  alter 
p70S6K  phosphorylation  [60] .  It  is  a  strength  of  our  simulator  that 
the  discrepancy  between  our  method’s  predictions  and  the 
experimental  results  identified  a  section  of  the  model  in  which 
additional  connectivity  has  been  found  which  might  account  for 
the  difference  observed. 

The  predictions  made  by  our  simulator  would  be  exceedingly 
difficult  to  derive  by  visual  or  manual  inspection.  Table  4  shows  the 
number  of  paths  between  several  pairs  of  compounds  within  tire 
network.  Where  there  is  more  than  one  path  connecting  two 
molecules,  feed  forward  and  feed  backward  loops  are  present. 
Attempting  to  determine,  by  hand,  how  these  different  loops  will 
interact  with  one  another  is,  by  itself,  a  difficult  endeavor  even  when 
not  considering  the  additional  task  of  deriving  the  rest  of  the  network 
dynamics  simultaneously.  For  the  larger  networks  that  are  now 
becoming  available,  computational  analysis  becomes  even  more 
crucial  to  obtaining  insights  into  the  dynamic  behavior  of  the  network. 

Despite  the  complexity  of  the  network  dynamics,  it  was 
straightforward  to  find  and  integrate  the  connectivity  information 
used  to  build  it.  Most  of  the  information  sources  [36-43] 
established  the  existence  of  various  pathways  and  provided  few  or 
no  biochemical  or  kinetic  details.  As  a  result,  the  literature  we  used 
would  have  provided  little  assistance  is  building  a  parameterized 
Petri  net  or  ODE  model.  Due  to  the  proliferation  of  curated 
signaling  network  repositories  and  searchable  literature  archives, 
connectivity  information  is  relatively  abundant  which  makes  the 
ad  hoc  assembly  of  networks  a  relatively  straightforward  endeavor. 
This  further  underscores  the  advantage  of  using  our  method  over 
ODEs  or  parameterized  Petri  nets  to  quickly  model  and 
characterize  some  of  the  dynamics  of  a  signaling  network. 

For  simulations  that  will  be  compared  to  experimental  results,  the 
time  parameter  must  be  selected  carefully.  The  time  parameter,  B, 
indicates  how  many  time  blocks  our  method  will  simulate.  The  time 
block  is  an  abstract  unit  of  time.  Therefore,  before  comparing 
experimental  results  and  predictions,  it  is  necessary  to  determine  how 
many  seconds,  minutes,  or  hours  correspond  to  a  time  block.  This 
can  be  done  by  comparing  a  prediction  of  the  simulator  with  the 
experimentally  measured  activity-level  of  one  or  two  proteins  at 
several  time  points  in  order  to  determine  what  time  blocks 
correspond  to  the  different  sampled  time  points.  In  the  present 
study,  we  calibrated  our  time  blocks  only  once  for  two  cell  lines  and 
six  experimental  conditions  (two  cell  lines,  with/without  TSC2, 
with/without  mTOR-Raptor).  To  select  the  time  parameter  we  used 
the  experimentally  measured  activity  changes  in  two  proteins  at  two 
time  points.  In  contrast  to  other  predictive  dynamic  analysis  tools 
which  require  multiple  time  points  and  multiple  protein  samples  in 
order  to  calibrate  simulation  and  model  parameters,  our  method  has 
relatively  low  time  and  resource  investment. 

Besides  the  time  parameter,  the  other  component  of  our 
simulations  which  involved  experimentally  obtained  knowledge 
was  the  initial  states.  The  experimentally  derived  initial  states 
require  that  some  experimental  data  be  available  providing 
information  on  the  initial  concentrations  of  individual  signaling 
proteins  in  the  network  prior  to  stimulation.  However,  in  the 
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Table  4.  Number  of  Paths  Connecting  Several  Pairs  of 
Compounds  in  the  EGFR  Model  Used  in  Our  Simulations 


Source  Protein 

Destination  Protein 

Number  of  Paths 

EGFR 

TSC2 

7 

AKT 

mTOR-Raptor 

6 

MEK 

EGFR 

4 

AKT 

p70S6K 

8 

The  multiple  paths  connecting  pairs  of  proteins  highlight  the  complex 
interactions  present  within  the  network  that  give  rise  to  its  overall  dynamic 
behavior. 

doi:1 0.1 371  /journal.pcbi.l  000005.t004 

network  that  we  considered  here,  the  overall  behavior  of  the 
network  and  of  individual  signaling  proteins  was  resilient  to 
changes  in  the  initial  states  used.  Zero  and  experimentally  derived 
both  produced  the  same  overall  change  predictions.  Thus,  while 
experimentally  derived  initial  states  may  be  important  for  the 
simulation  of  some  networks,  it  may  well  be  the  case  that  many 
networks  (such  as  the  one  we  considered  in  this  paper)  can  be 
simulated  without  this  knowledge — further  reducing  the  experi¬ 
mental  work  that  must  be  done  prior  to  simulation. 

The  fact  that  our  simulator  produced  accurate  predictions  for  a 
variety  of  experimental  conditions  using  the  one  core  network 
model  and  set  of  simulation  parameters  also  distinguishes  our 
method  from  other  predictive  approaches.  The  only  aspects  of  the 
model  that  were  modified  during  the  simulations  were  activity- 
levels  reflecting  the  immediate  effects  of  either  the  underlying 
tumor  mutations  (Ras  and  PTEN)  or  the  perturbations  (mTOR- 
Raptor  and  TSC2  targeted  manipulation).  In  contrast,  the 
accuracy  of  ODEs  and  Petri  nets  predictions  are  known  to  be 
sensitive  to  small  changes  to  the  model.  For  comparative  studies 
such  as  the  one  conducted  in  this  paper,  an  ODE  or 
parameterized  Petri  net  model  might  need  to  be  re-constructed 
with  different  parameters  for  each  experimental  condition  of 
interest.  As  a  result,  while  it  is  possible  to  obtain  our  simulation 
results  using  these  models,  it  remains  beyond  the  capabilities  of 
any  existing  ODE  or  parameterized  Petri  net  system  to  provide 
insights  into  the  effects  of  experimental  conditions  on  the  dynamic 
behavior  of  a  signaling  network  with  so  little  initial  time  and 
resource  investment. 

Though  our  method’s  predictions  will  not  be  as  accurate  as  the 
results  returned  by  a  correctly  parameterized  ODE,  biologists 
using  our  method  can  derive  information  about  a  network’s 
dynamic  behavior  without  having  to  conduct  extensive  experi¬ 
mentation  and  computationally  expensive  parameter  estimation. 
This  novel  capability  offers  scientists  the  exciting  prospect  of  being 
able  to  test  hypotheses  regarding  signal  propagation  in  sllico.  As  a 
result,  by  using  our  method  researchers  can  evaluate  a  wide  array 
of  network  responses  in  order  to  determine  the  most  promising 
experiments  before  even  entering  the  laboratory. 
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Abstract 

Background:  In  systems  biology  the  experimentalist  is  presented  with  a  selection  of  software  for 
analyzing  dynamic  properties  of  signaling  networks.  These  tools  either  assume  that  the  network  is 
in  steady-state  or  require  highly  parameterized  models  of  the  network  of  interest.  For  biologists 
interested  in  assessing  how  signal  propagates  through  a  network  under  specific  conditions,  the  first 
class  of  methods  does  not  provide  sufficiently  detailed  results  and  the  second  class  requires  models 
which  may  not  be  easily  and  accurately  constructed.  A  tool  that  is  able  to  characterize  the  dynamics 
of  a  signaling  network  using  an  unparameterized  model  of  the  network  would  allow  biologists  to 
quickly  obtain  insights  into  a  signaling  network's  behavior. 

Results:  We  introduce  PathwayOracle,  an  integrated  suite  of  software  tools  for  computationally 
inferring  and  analyzing  structural  and  dynamic  properties  of  a  signaling  network.  The  feature  which 
differentiates  PathwayOracle  from  other  tools  is  a  method  that  can  predict  the  response  of  a 
signaling  network  to  various  experimental  conditions  and  stimuli  using  only  the  connectivity  of  the 
signaling  network.  Thus  signaling  models  are  relatively  easy  to  build.  The  method  allows  for  tracking 
signal  flow  in  a  network  and  comparison  of  signal  flows  under  different  experimental  conditions.  In 
addition,  PathwayOracle  includes  tools  for  the  enumeration  and  visualization  of  coherent  and 
incoherent  signaling  paths  between  proteins,  and  for  experimental  analysis  -  loading  and 
superimposing  experimental  data,  such  as  microarray  intensities,  on  the  network  model. 

Conclusion:  PathwayOracle  provides  an  integrated  environment  in  which  both  structural  and 
dynamic  analysis  of  a  signaling  network  can  be  quickly  conducted  and  visualized  along  side 
experimental  results.  By  using  the  signaling  network  connectivity,  analyses  and  predictions  can  be 
performed  quickly  using  relatively  easily  constructed  signaling  network  models.  The  application  has 
been  developed  in  Python  and  is  designed  to  be  easily  extensible  by  groups  interested  in  adding 
new  or  extending  existing  features.  PathwayOracle  is  freely  available  for  download  and  use. 


Background 

Reconstructing  cellular  signaling  networks  and  under¬ 
standing  how  they  work  are  major  endeavors  in  cell  biol¬ 
ogy.  The  scale  and  complexity  of  these  networks,  however, 


render  their  analysis  using  experimental  biology 
approaches  alone  very  challenging.  As  a  result,  computa¬ 
tional  methods  have  been  developed  and  combined  with 
experimental  biology  approaches,  producing  powerful 
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tools  for  the  analysis  of  these  networks.  These  tools  aid 
biologists  in  interpreting  existing  experimental  findings, 
evaluating  hypotheses,  enumerating  possible  biological 
behaviors,  and,  ultimately,  in  quickly  designing  experi¬ 
ments  that  maximize  the  amount  of  useful  information 
gained.  By  assisting  biologists  in  maximizing  the  amount 
of  information  obtained  from  their  experiments  through 
improved  experimental  design  and  more  thorough  analy¬ 
sis  of  results,  computational  tools  increase  the  pace  of  sci¬ 
entific  discovery. 

Biological  network  analysis  can  generally  be  classified  as 
either  structural  or  dynamic  [1].  Structural  analysis  pro¬ 
vides  insights  into  global  properties  of  the  network, 
among  them  decomposition  of  the  network  into  func¬ 
tional  modules  (e.g.,  [2]),  enumeration  of  signaling  paths 
connecting  arbitrary  protein  pairs  (e.g.,  [3-5]),  and  the 
identification  of  key  pathways  that  determine  the  behav¬ 
ior  of  the  network  (e.g.,  [2,6-10]).  Dynamic  methods,  on 
the  other  hand,  simulate  the  actual  propagation  of  signals 
through  a  network  by  predicting  the  changes  in  the  con¬ 
centration  of  signaling  proteins  over  time.  These  predic¬ 
tions  will  be  of  varying  degrees  of  resolution  and  accuracy, 
depending  largely  on  the  accuracy  and  level  of  detail  of 
the  model  from  which  they  are  produced. 

The  prevailing  methods  for  dynamic  analysis  involve  sys¬ 
tems  of  ordinary  differential  equations  (ODEs)  [11,12]. 
These  approaches  require  kinetic  parameters  for  the  indi¬ 
vidual  biochemical  reactions  involved  in  the  signaling 
process.  This  requirement  often  poses  a  significant  hurdle 
for  researchers  as  the  numerical  values  of  such  parameters 
are  difficult  to  obtain  and  may  be  the  object  of  the 
researcher's  project  in  the  first  place.  In  [13],  we  presented 
a  novel  signaling  network  simulation  method  which  uses 
a  non-parametric  Petri  net  model  of  network  to  predict 
the  signal  flow  under  various  experimental  conditions. 
Our  simulation  method  uses  a  novel  technique  to  approx¬ 
imate  the  interaction  speeds  and  predicts  the  qualitative 
behavior  of  the  signaling  network  dynamics. 

The  advantage  of  our  method  over  ODEs  is  the  wide  avail¬ 
ability  of  connectivity-based  models  of  signaling  net¬ 
works,  and  the  relative  speed  with  which  they  can  be 
constructed.  Numerous  databases  exist  which  catalog 
known  signaling  interactions  (e.g.,  [14-16]).  Thus,  the 
existence  and  type  (activating  or  inhibition)  of  an  interac¬ 
tion  can  often  be  inferred  directly  from  literature  and/or 
these  databases.  This  presents  a  stark  contrast  to  the 
kinetic  parameters  required  by  ODEs,  the  numerical  val¬ 
ues  for  many  of  which  must  be  determined  experimen¬ 
tally  for  each  experimental  condition  and  cell  line  of 
interest  [2]. 


In  this  paper,  we  present  the  software  tool  Pathway  Oracle, 
an  integrated  environment  for  connectivity-based  struc¬ 
tural  and  dynamic  analysis  of  signaling  networks,  sup¬ 
porting 

•  visualization  of  signaling  network  connectivity; 

•  two  versions  of  the  simulation  method  described  in  [13] 
where 

-  the  first  allows  prediction  of  signal  flow  through  a  given 
network  for  a  specific  experimental  condition,  and 

-  the  second  predicts  the  difference  in  signal  flow  through 
a  given  network  induced  by  two  different  experimental 
conditions; 

•  enumeration  of  the  paths  connecting  arbitrary  pairs  of 
nodes  in  the  network;  and 

•  visualization  of  experimental  concentration  data  on  the 
signaling  network  display. 

In  future  releases  we  plan  on  expanding  capabilities  in  all 
three  areas  of  analysis  -  dynamic,  structural,  and  experi¬ 
mental  -  with  a  focus  on  providing  effective  ways  of  inte¬ 
grating  results  from  each  together. 

PathwayOracle  has  been  designed  in  a  modular  fashion  in 
order  to  facilitate  extension  of  existing  capabilities  and  the 
addition  of  new  features. 

Since  PathwayOracle's  most  distinctive  analytical  capabil¬ 
ity  involves  the  signaling  Petri  net  simulator,  a  new 
dynamic  analysis  technique  for  signaling  networks,  we 
first  provide  an  overview  of  the  signaling  Petri  net  mode¬ 
ling  approach.  Then  in  subsequent  sections,  we  focus  on 
PathwayOracle  and  explain  the  architecture  and  core  con¬ 
cepts  underlying  the  tool  and  then  examine  the  individual 
features,  how  they  can  be  used,  and  how  they  compare  to 
existing  tools. 

The  Signaling  Petri  Net  Simulator 

Petri  nets  provide  a  graphical  and  executable  model  of 
processes  in  which  information  or  material  flows  among 
a  series  of  places  or  entities  [17].  A  Petri  net  consists  of 
places,  transitions,  and  tokens  (see  Figure  1).  Quantities 
of  tokens  are  assigned  to  individual  places.  This  assign¬ 
ment  is  called  a  marking.  As  Figure  1  illustrates,  the  net¬ 
work  flow  is  modeled  by  the  reassignment  of  tokens  to 
individual  places  in  the  Petri  net  in  response  to  transition 
firings. 

A  signaling  Petri  net  is  an  extension  of  the  Petri  net  for¬ 
malism  to  model  a  signaling  network.  Places  are  signaling 
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(a)  (b) 

Figure  I 

An  example  of  how  tokens  move  among  places.  In  a 

Petri  net,  quantities  of  tokens  are  assigned  to  places.  In  (a), 
three  tokens  are  assigned  to  place  and  zero  tokens  are 
assigned  to  place  pg.  The  two  places  are  connected  by  a  tran¬ 
sition,  t|.  The  arcs  in  and  out  of  t,  indicate  the  direction  in 
which  tokens  move.  When  t|  fires,  it  moves  some  number  of 
tokens  from  p/^and  puts  them  in  pg.  In  (b),  transition  t,  has 
fired  and  moved  two  tokens  from  p/^to  pg. 


proteins  and  transitions  implement  directed  protein  inter¬ 
actions;  each  transition  models  the  effect  of  a  source  pro¬ 
tein  on  a  target  protein.  The  marking  of  (number  of 
tokens  in)  protein  p  at  time  £  is  interpreted  as  the  activity- 
level  of  that  protein  -  the  number  of  activated  molecules 
of  that  type.  Figure  2  shows  the  correspondence  between 
a  signaling  network  and  a  signaling  Petri  net  model. 

The  signaling  Petri  net  simulator  models  signal  flow  as  the 
pattern  of  token  accumulation  and  dissipation  within 
proteins  over  time  in  the  Petri  net.  Through  transition  fir¬ 
ings,  the  source  can  influence  the  marking  of  (the  number 
of  tokens  assigned  to)  the  target,  modeling  the  way  that 
signals  propagate  through  protein  interactions  in  cellular 
signaling  networks. 

In  order  to  overcome  the  issue  of  modeling  reaction  rates 
in  the  network,  signaling  dynamics  are  simulated  by  exe¬ 
cuting  the  signaling  Petri  net  (SPN)  for  a  set  number  of 
steps  (called  a  run)  multiple  times,  each  time  beginning  at 
the  same  initial  marking.  For  each  run,  the  individual  sig¬ 
naling  rates  are  simulated  via  generation  of  random 
orders  of  transition  firings  (interaction  occurrences). 
When  the  results  of  a  large  enough  number  of  runs  are 
averaged  together,  we  find  that  the  change  in  distribution 


of  tokens  in  the  network  correlate  with  experimentally 
measured  changes  in  the  activity-levels  of  individual  pro¬ 
teins  in  the  underlying  signaling  network.  In  essence,  the 
tokenized  activity-levels  computed  by  our  method  should 
be  taken  as  abstract  quantities  whose  changes  over  time 
correlate  to  changes  that  occur  in  the  amounts  of  active 
proteins  present  in  the  cell.  It  is  worth  noting  that  some  of 
the  most  widely  used  experimental  techniques  for  protein 
quantification  -  western  blots  and  microarrays  -  also 
yield  results  that  are  treated  as  indications,  but  not  exact 
measurements,  of  protein  activity-levels  within  the  cell. 
Thus  in  some  respects,  the  predictions  returned  by  our 
SPN-based  simulator  can  be  interpreted  like  the  results  of 
a  western  blot  or  microarray  experiment  looking  at 
changes  relative  to  "control". 

During  a  simulation  run,  the  simulator  imposes  a  strict 
ordering  on  transition  firing  such  that  it  creates  a  two-time 
scale  simulation.  The  smaller  time  scale  is  discretized  as 
the  firing  of  a  single  transition.  This  unit  is  referred  to  as 
the  firing  time  scale.  Firing  steps  are  nested  within  a  larger 
time  scale,  called  time  blocks,  within  which  each  transition 
is  fired  exactly  once.  The  values  returned  by  the  simulator 
are  the  averaged  token-counts  for  each  protein  at  each 
time-block  (across  all  runs). 

Figure  3  provides  a  small  example  of  a  simulation  run 
whose  duration  is  two  time  blocks.  As  mentioned  previ¬ 
ously,  within  a  given  time  block,  each  transition  fires 
exactly  once.  Thus,  in  the  table  (Figure  3(c)),  there  is  one 
column  for  each  transition  in  each  time  block.  The  order¬ 
ing  of  the  transitions  is  shuffled  in  each  time  block  in 
order  to  sample  a  different  set  of  signaling  rates  within  the 
networks. 

In  the  first  time  block,  transition  fires  first:  it  reads  2 
tokens  out  of  Grb2  and  places  2  additional  tokens  in  Ras. 
Transition  tj  fires  second,  reading  3  tokens  out  of  Grb2. 
Transition  £3  is  evaluated  last.  The  final  marking  for  the 
network,  highlighted  as  the  red  column  in  block  1  is  used 
by  the  simulator  as  the  marking  for  that  block  when  aver¬ 
aging  across  runs. 

At  the  conclusion  of  block  2,  compare  the  values  high¬ 
lighted  in  red  in  the  Initial  column  and  at  the  end  of  both 
blocks.  Note  how  the  distribution  of  tokens  have  changed 
over  the  course  of  the  simulation.  Grb2  has  the  same 
number  of  tokens,  implying  that  its  activity-level  has 
remained  unchanged  -  this  is  consistent  with  the  signal¬ 
ing  network  since  no  activating  or  inhibiting  edges  affect 
it  in  the  model.  AKTs  token-count  has  risen,  consistent 
with  the  fact  that  it  is  only  activated  in  the  signaling  net¬ 
work.  Ras's  token-count  has  fallen  which  is  one  plausible 
behavior  of  the  system  since  it  is  activated  by  Grb2,  but 
inhibited  by  AKT. 
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Figure  2 

An  example  signaling  network  and  its  corresponding  Petri  net.  An  example  signaling  network  (a)  and  its  correspond¬ 
ing  Petri  net  (b).  Each  signaling  protein  in  the  network,  A,  6,  and  C,  is  designated  as  a  place  pg,  and  pQ  A  signaling  interaction 
becomes  a  transition  node  and  its  input  and  output  arcs.  Note  that  the  connectivity  for  an  activating  edge  differs  from  that  of 
an  inhibitory  edge. 


Implementation 

PathwayOracle  is  written  in  Python  [18].  The  user  experi¬ 
ence  is  oriented  around  visualization  of  and  interaction 
with  three  main  types  of  data:  the  signaling  network, 
markings,  and  paths.  At  any  given  time,  one  signaling  net¬ 
work  is  open,  which  is  the  basis  for  all  analyses.  Any  sim¬ 
ulation  or  concentration  data  is  loaded  and  inspected  as 
markings.  Currently  all  static  analyses  revolve  around 
paths,  which  are  the  third  data  type.  In  the  following  sub¬ 
sections,  these  individual  data  types  and  the  user  inter¬ 
faces  to  them  are  discussed  in  more  detail. 

The  Signaling  Network  Model 

While  the  implementation  of  our  methods  use  the  signal¬ 
ing  Petri  net  model  discussed  in  an  earlier  section  of  this 
paper,  we  provide  a  simpler  and  more  convenient  repre¬ 
sentation  of  the  network  to  the  user  which  omits  the  inter¬ 
nal  topology  of  the  transitions  and  allows  the  user  to 
specify  interactions  simply  as  either  activating  or  inhibit¬ 


ing.  Thus,  for  the  remainder  of  this  paper  we  use  the  fol¬ 
lowing  definition  of  the  signaling  network  which  is 
consistent  with  the  experience  the  user  will  have  when 
working  with  PathwayOracle.  The  signaling  network  con¬ 
nectivity  is  a  directed  graph  G  =  (V,  E)  where 

•  V  is  the  set  of  nodes,  which  are  signaling  proteins  and 
complexes  (hereafter  referred  to  collectively  as  signaling 
nodes)  and 

•  £  is  the  set  of  edges,  which  are  signaling  interactions. 
Each  edge  is  of  one  of  two  types:  w  ^  y  for  activation  and 
u  V  for  inhibition. 

Within  PathwayOracle,  each  signaling  node  has  a  name, 
unique  within  the  network.  A  signaling  edge  has  no  prop¬ 
erties  besides  its  type  and  is  only  defined  by  its  source  and 
target. 
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Figure  3 

An  example  signaling  Petri  net  simulation,  (a)  is  the  signaling  network  being  simulated,  (b)  is  the  signaling  Petri  net  that 
models  that  signaling  Petri  net.  The  table  in  (c)  provides  the  markings  for  the  Petri  net  over  the  course  of  a  simulation  run 
whose  duration  is  two  time  blocks.  The  proteins  are  given  the  initial  marking  shown  in  the  Initial  column.  Each  subsequent  col¬ 
umn  corresponds  to  a  single  time  step  during  which  one  transition  fired,  producing  a  new  marking  of  the  network.  The  bold 
number  in  each  column  indicates  which  protein's  marking  was  affected  by  the  transition  that  fired  in  that  time  step.  The  red 
columns  -  always  the  last  time  step  in  the  block  -  highlight  the  markings  whose  values  would  be  averaged  and  used  as  part  of 
the  final  result.  These  red  columns  are  the  sources  of  the  markings  that  PathwayOracle  reports. 
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In  order  to  facilitate  the  rapid  construction  of  such  signal¬ 
ing  network  models,  we  devised  a  file  format  called  the 
Connectivity  Format.  It  is  capable  of  expressing  both  gen¬ 
eral  networks  as  well  as  paths.  When  representing  a  net¬ 
work  in  the  format,  as  shown  in  the  example  in  Figure 
4(b),  one  signaling  interaction  is  written  on  a  line  with 
the  format 

u->v  or  u  -  \  V 

where  u  is  the  name  of  the  source  signaling  node  and  v  is 
the  name  of  the  target  signaling  node.  Each  node  is  taken 
to  represent  the  active  form  of  the  protein  it  is  named  for. 
Thus,  from  the  example  above,  the  interaction  PI-3- 
K-^AKT  means  that  the  active  form  of  PT3-K  increases  the 
activity-level  of  AKT  whereas  the  interaction  PTENAKT 
means  that  the  active  form  of  PTEN  decreases  the  activity- 
level  of  AKT.  While  these  types  of  unparameterized  rela¬ 
tionships  can  be  represented  in  SBML,  SBML  was  designed 
for  encoding  much  more  information  than  just  connectiv¬ 
ity  [19].  As  a  result,  we  deemed  it  appropriate  to  design  a 
more  concise  format  for  our  purposes.  Elowever,  in  a 
future  release,  PathwayOracle  will  support  loading  and 
saving  in  the  SBML  format. 

At  a  given  point  in  time,  only  one  signaling  network  can 
be  open  in  PathwayOracle.  The  main  window  displays  a 


graphical  representation  of  the  network.  The  layout  of  the 
network  can  be  modified  by  dragging  nodes  or  by  shift- 
clicking  on  edges  to  create,  remove,  or  move  waypoints. 
These  layouts  can  be  saved  with  the  network  and  loaded 
again. 

Signaling  Network  Markings 

In  signaling  networks,  signal  flow  is  measured  and  quan¬ 
tified  as  the  fluctuation  of  concentrations  of  various  forms 
of  signaling  proteins  over  time.  In  PathwayOracle,  we 
model  concentrations  using  the  concept  of  a  network 
marking,  which  was  adapted  from  Petri  nets  in  which  it 
was  first  used  [9]. 

Markings 

In  PathwayOracle,  a  marking,  ju  is  an  assignment  of  real 
values  to  the  nodes  of  a  signaling  network  such  that  every 
signaling  node  receives  a  value.  Earlier,  the  concept  of  a 
marking  was  introduced  as  the  assignment  of  tokens  to 
protein  places  in  the  signaling  Petri  net.  In  a  signaling 
Petri  net,  tokens  are  discrete.  In  PathwayOracle,  a  marking 
is  an  average  of  the  markings  from  many  independent 
simulation  runs,  which  gives  rise  to  the  real,  rather  than 
integral  values,  assigned  by  the  marking. 

As  discussed  earlier,  the  value  of  the  marking  of  a  signal¬ 
ing  node,  ju{v),  can  be  interpreted  as  an  estimate  of  the 


(a) 


(b) 


Figure  4 

An  example  of  a  Network  in  the  Connectivity  Format,  (a)  A  graphical  representation  of  a  signaling  network's  connec¬ 
tivity.  (b)  The  signaling  network  in  (a)  written  in  the  Network  Connectivity  Format. 
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concentration  or  change  in  concentration  of  the  active 
form  of  the  signaling  protein  v  (we  call  the  amount  of  the 
active  form  of  the  signaling  protein  its  activity-level).  The 
two  different  versions  of  the  simulator  generate  markings 
with  these  different  meanings.  The  first  simulator  predicts 
the  signal  flow  due  to  an  experimental  condition  and  gen¬ 
erates  markings  whose  values  are  taken  to  represent  the 
actual  activity-level  of  signaling  protein  present  over  the 
assumed  basal  levels.  The  second  version  of  the  simulator 
predicts  the  difference  in  signaling  due  to  changing  exper¬ 
imental  conditions.  The  values  assigned  by  markings  pro¬ 
duced  by  this  simulator  correspond  to  the  change  in  the 
activity-level  of  the  protein  induced  by  the  change  in 
experimental  condition.  This  will  be  discussed  further  in 
the  Results  and  Discussion  section. 

Marking  Series 

In  order  to  model  signal  flow,  a  single  marking  is  not 
enough  since  it  only  provides  a  single  snapshot  of  concen¬ 
trations  throughout  the  network.  A  marking  series  is  an 
sequence  of  markings,  {ju^,  jU2,...,  jUj)  in  which  the  marking 
is  a  snapshot  of  the  concentration  distribution  at  time 
step  t.  Thus,  it  is  possible  to  see  how  the  activity-level  of 
protein  v  changed  by  plotting  the  values  jUi{v),  jUjiv),..., 
//j-(y).  PathwayOracle  provides  the  ability  to  do  this. 

PathwayOracle  supports  loading  a  marking  series  dataset 
from  comma-separated  value  (.csv)  files.  As  shown  in  Figure 
5(a),  the  file  has  a  header  row  which  specifies,  for  each 
column,  the  name  of  the  molecule  whose  concentration 
values  will  appear  in  that  column.  Each  subsequent  row 
contains  the  value  assignments  for  a  marking:  the  second 
row  contains  the  marking  for  time  step  1,  the  third  row 
contains  the  marking  for  time  step  2,  and  so  on. 

Marking  Groups 

In  many  experiments,  the  activity-level  of  various  proteins 
are  sampled  at  different  time  points  and  under  different 


experimental  conditions.  Since  the  marking  series  is  not 
able  to  represent  changes  due  to  different  experimental 
conditions,  we  introduced  the  more  general  concept  of  a 
marking  group  in  which  each  marking  can  correspond  to 
an  arbitrary  activity-level  distribution.  Each  marking  is 
given  a  descriptive  label  that  can  be  used  to  identify  the 
conditions  under  which  the  activity-level  was  sampled. 

Like  the  marking  series,  a  marking  group  is  loaded  from  a 
.CSV  file.  Elowever,  unlike  the  marking  series  in  which  each 
row  corresponds  to  a  time  step,  in  the  marking  group, 
each  row  corresponds  to  an  independent  marking  (exper¬ 
imental  condition).  As  shown  in  Eigure  5(b),  the  first  row 
is  a  header  row  specifying  the  molecule  names  for  each 
column,  the  first  column  specifies  the  names  for  the  indi¬ 
vidual  markings  (experimental  conditions). 

The  Marking  Manager 

PathwayOracle  includes  a  specific  user-interface,  the  Mark¬ 
ing  Manager,  designed  to  manage  the  three  different  types 
of  markings.  The  Marking  Manager  provides  a  central 
interface  within  which  it  is  possible  to  view  all  markings 
loaded  and  inspect  them  in  ways  that  are  relevant  to  their 
type  (marking,  marking  series,  or  marking  group).  The 
specific  ways  in  which  markings  can  be  inspected  will  be 
discussed  further  in  the  Results  section. 

Signaling  Paths 

The  current  structural  analysis  capabilities  available  in 
PathwayOracle  allow  inspection  of  signaling  paths  within 
the  network.  A  signaling  path  p  is  a  sequence  of  nodes,  (fj, 
Vj,...,  v,J  where  y,-  e  W1  <  f  <  fe,  and  (y,-,  y,-  +  1)  e  £  VI  <  i 
<k.  In  this  case,  we  say  that  node  yl  is  the  source  of  path 
p,  and  node  Vj^  is  the  target  of  p.  Given  a  path,  a  variety  of 
statistics  may  be  of  interest  to  the  user.  Additionally,  it 
may  be  useful  to  view  the  path  within  the  larger  network. 
PathwayOracle  provides  these  capabilites  which  will  be 
discussed  in  the  Results  and  Discussion  section. 
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Figure  5 

Examples  of  marking  series  and  group  file  formats,  (a)  An  example  marking  series  dataset  in  the  comma-separated  value 
file  format.  The  first  row  specifies  the  signaling  proteins  whose  concentrations  were  measured.  Each  row  thereafter  specifies 
the  concentration  for  a  given  time  step:  row  i  specifies  the  concentrations  for  each  signaling  protein  at  time  step  /  -  I .  (b)  An 
example  marking  group  dataset  in  the  comma-separated  value  file  format.  The  first  row  specifies  the  signaling  proteins  whose 
concentrations  were  measured.  The  first  column  specifies  the  names  for  each  marking  in  the  group  dataset.  The  numbers  in 
each  row  specify  the  concentration  measured  for  each  signaling  protein  in  that  marking. 
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(a) 


EGFR  ->  PI-3-K  ->  AKT  -|  c-Raf  ->  MEK 
EGER  ->  Ras  ->  c-Raf  ->  MEK 


(b) 


Figure  6 

An  example  of  a  Path  in  the  Connectivity  Format,  (a)  A  graphical  representation  of  two  signaling  paths,  (b)  The  signal¬ 
ing  paths  in  (a)  represented  in  the  Connectivity  Format.  Each  line  corresponds  to  a  single  signaling  path. 


Sets  of  paths  can  be  saved  to  a  file  and  loaded  back  into  a 
session.  Like  networks,  paths  are  also  stored  in  the  Con¬ 
nectivity  Format.  When  representing  a  set  of  paths,  as 
shown  in  Figure  6,  the  full  node  names  and  the  edge  types 
are  written  so  that  all  path  information  is  directly  availa¬ 
ble  within  the  file  itself.  One  line  contains  one  path. 

Results 

PathwayOracle  provides  a  variety  of  tools  for  analyzing  the 
structural  and  dynamic  properties  of  a  signaling  network 
based  on  its  connectivity.  While  its  main  differentiating 
feature  is  the  ability  to  predict  signal  flow  through  a  net¬ 
work  using  only  the  connectivity  of  the  signaling  network, 
PathwayOracle  also  provides  the  ability  to  visualize  the 
network,  analyze  its  connectivity,  and  inspect  concentra¬ 
tion-based  experimental  data. 

With  the  exception  of  the  signaling  Petri  net  simulator, 
PathwayOracle's  features  can  be  found  in  various  combi¬ 
nations  in  other  tools.  Figure  7  provides  a  matrix  of  the 
features  and  capabilities  of  several  tools  most  commonly- 
used  for  signaling  network  analysis.  While  other  tools 
support  a  variety  of  simulation  techniques,  PathwayOra¬ 
cle,  alone,  provides  non-parameterized  simulation  capa¬ 
bilities.  ft  is  worth  noting  that  the  commercial  software 
package  Celllllustrator  [20]  provides  Petri  net-based  sim¬ 
ulation  capabilities.  The  difference  between  Celllllustrator 
and  PathwayOracle  Petri  net  approaches  is  the  extensive 
set  of  kinetic  parameters  required  by  Celllllustrator  in 
order  to  simulate  a  biological  system.  In  this  regard, 
hybrid  functional  Petri  nets,  the  underlying  technology 
used  by  Celllllustrator,  are  not  significantly  different  from 
ODEs. 

Another  important  distinguishing  characteristic  of  Path¬ 
wayOracle  is  the  combination  of  features  that  it  supports. 
Biological  network  analysis  is  a  multi-faceted  process  that 
may  involve  structural,  dynamic,  and  data  analysis  in  par¬ 
allel.  Whereas  other  tools  tend  to  focus  on  one  or  two  of 
these  general  areas  of  analysis,  we  considered  it  important 
for  PathwayOracle  to  incorporate  all  three  in  order  to  pro¬ 


vide  the  researcher  a  single  environment  in  which  all  their 
analysis  could  be  done.  In  future  releases  we  plan  to 
increase  PathwayOracle's  support  for  all  three  of  these 
directions  of  investigation:  structural,  dynamic,  and  data 
analysis. 

In  the  remainder  of  this  section,  we  discuss  the  features 
currently  available  in  PathwayOracle. 

Network  Visualization 

As  in  many  other  computational  analysis  tools  for  signal¬ 
ing  networks  (e.g.,  [20,21]),  an  interactive  graphical  rep¬ 
resentation  of  the  signaling  network  connectivity  is  at  the 
center  of  the  PathwayOracle  interface.  The  main  window 
provides  a  visualization  of  the  signaling  network  connec¬ 
tivity.  This  visualization  interface  allows  the  user  to  edit 
the  layout  of  the  network  by  clicking  on  and  dragging 
nodes  and  by  sk/ft-clicking  on  edges  to  create,  remove,  or 
move  waypoints.  Waypoints  are  points  that  lie  on  an 
edge.  Flolding  down  shift  will  display  all  edge  waypoints. 
Existing  waypoints  can  be  dragged  to  change  the  path  that 
an  edge  follows.  Right-clicking  on  a  waypoint  will  remove 
it.  Left-clicking  on  a  straight  segment  of  the  edge  will  cre¬ 
ate  a  new  waypoint. 

The  network  visualization  also  provides  a  view  onto 
which  path  and  experimental  data  analysis  may  be 
mapped.  As  will  be  discussed  in  subsequent  sections, 
selected  paths  may  be  highlighted  in  this  view  and  mark¬ 
ings  from  experiments  can  set  the  colorings  of  individual 
nodes. 

Network  Signal  Flow  Simulation 

The  main  feature  differentiating  PathwayOracle  from  other 
tools,  such  as  CellDesigner  [20]  and  COPASI  [22],  is  its 
ability  to  simulate  signal  flow  using  an  unparameterized 
signaling  network  model.  Simulations  can  be  performed 
in  two  different  ways.  In  the  first  (Single  Simulation),  the 
simulator  predicts  the  signal  flow  through  the  network  for 
a  specific  experimental  condition.  In  the  second  (Differen¬ 
tial  Simulation),  the  simulator  predicts  the  difference  in 
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Figure  7 

A  comparison  of  features  supported  by  tools  commonly  used  for  signaling  network  analysis.  The  table  shows  the 
features  and  analytical  capabilities  supported  by  different  tools  commonly  used  for  the  analysis  of  signaling  networks.  Tools 
included  in  the  comparison  are:  CellDesigner  [20],  Celllllustrator  [24],  CellNetAnalyze  [25],  COPASI  [22],  Cytoscape  [2 1],  the 
System  Biology  Toolkit  for  Matlab  [26],  and  PathwayOracle. 


signal  flow  due  to  two  different  experimental  conditions 
on  the  same  network.  These  simulation  methods  them¬ 
selves  are  described  in  [13].  Here  we  focus  on  how  simu¬ 
lations  are  configured,  run,  and  analyzed. 

Whereas  the  consensus  networks  typically  represent  the 
connectivity  in  normal  cells,  many  experiments  are  con¬ 
ducted  on  abnormal  cells  in  which  oncogenic  mutations, 
gene  knockous,  and  pharmacological  inhibitors  have 
altered  the  behavior  of  various  signaling  nodes  in  the  net¬ 
work.  In  PathwayOracle  users  can  model  these  cell-  and 
experiment-specific  conditions  by  specifying  each  signal¬ 
ing  node  as  either  High,  Low,  or  Free.  The  High  state  mod¬ 
els  any  condition  under  which  a  protein's  activity-level  is 
held  high  for  the  duration  of  the  experiment.  This  may  be 
due  to  external  stimulation  or  a  known  mutation  in  the 
protein  that  makes  it  constitutively  active,  for  example. 
Similarly,  a  Low  state  models  any  phenomenon  that  forces 
a  protein  to  have  a  persistently  suppressed  activity-level. 
This  may  be  due  to  mutations  that  render  the  protein  inac¬ 
tive,  gene  knockouts,  or  pharmacological  inhibitors  that 
force  the  activity-level  of  the  protein  low.  In  general,  most 
signaling  nodes  will  be  Free,  which  means  that  their  activ¬ 
ity-level  is  unconstrained  throughout  the  simulation. 
Only  those  nodes  designated  as  High  or  Low  will  have 
their  activity-level  fixed  for  the  duration  of  the  simulation. 

In  order  for  a  protein  to  be  held  high  during  the  simula¬ 
tion,  it  is  necessary  to  indicate  the  initial  activity-level  that 
the  protein  will  be  elevated  to.  This  is  done  by  specifying 
the  number  of  tokens  that  the  protein  will  receive.  Since  a 
protein  with  a  High  state  cannot  be  inhibited  (even  if 
inhibitory  edges  target  it  in  the  actual  network),  the  pro¬ 


tein's  activity  level  will  never  fall  below  this  initial  value. 
The  initial  value  for  a  High  protein  is  indicated  by  placing 
it  in  parentheses  next  to  the  protein's  name,  as  shown  in 
Figure  8.  Two  other  parameters  that  must  be  specified  for 
a  simulation  are: 

•  the  number  of  simulation  mns  to  perform  and 

•  the  number  of  time  blocks 

The  number  of  runs  sets  the  number  of  independent  sim¬ 
ulations  whose  time  block  markings  are  averaged  together 
to  yield  the  overall  simulation  markings.  In  general,  using 
more  mns  is  a  tradeoff  between  reliability  of  the  results 
and  simulation  speed.  In  practice,  the  number  of  mns 
needed  is  dependent  on  the  signaling  network  model  and 
should  be  selected  by  observing  the  reproducability  of  the 
simulation  results.  An  appropriate  number  of  iterations 
will  be  large  enough  so  that  for  a  given  experimental  con¬ 
dition,  the  results  are  very  similar  across  multiple  simula¬ 
tions. 

The  time  block,  as  discussed  earlier,  is  a  fundamental  unit 
of  time  in  the  simulator.  The  appropriate  number  of  time 
blocks  for  which  to  simulate  will  vary  depending  on  the 
size  of  the  signaling  network  and  the  scale  of  the  network 
behavior  of  interest.  Generally  it  should  be  selected  by 
mnning  simulations  for  a  variety  of  time  block  values  and 
determining  which  yields  the  most  biologically  reasona¬ 
ble  activity-level  changes  for  a  known  protein.  While  this 
is  a  manual  process  in  the  current  version  of  PathwayOra¬ 
cle,  we  are  investigating  automated  methods  for  estimat- 
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Figure  8 

The  tokenized  simulator  user  interface,  (a)  The  setup  window  for  the  tokenized  simulator.  The  simulation  is  being  con¬ 
figured  to  have  two  High  nodes,  EGF  and  LKB-auto.  EGF  will  be  initialized  with  a  token-count  of  10,  LKB-auto  with  a  token- 
count  of  3.  The  token-count  of  AMPK  will  be  zero  for  the  duration  of  the  simulation,  (b)  The  setup  window  for  the  differential 
simulator.  Two  different  scenarios  are  being  compared  through  simulation:  different  token  assignments  are  being  tried  with 
EGF  and  LKB-auto,  with  and  without  AMPK  being  fixed  low.  (c)  The  plot  window  for  the  marking  series  generated  by  a  simu¬ 
lation.  Observe  that  the  signaling  nodes  whose  activity-levels  are  plotted  correspond  to  those  selected  in  the  checklist  directly 
to  the  left  of  the  plot. 
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ing  the  number  of  time  blocks  by  training  against 
experimental  time  series  data. 

In  PathwayOracle,  the  setup  window  for  the  Single  Simula¬ 
tion  (see  Figure  8(a))  prompts  the  user  for  a  single  experi¬ 
mental  condition.  The  setup  window  for  the  Differential 
Simulation  (see  Figure  8(b))  prompts  the  user  for  two 
experimental  conditions.  Both  simulators  produce  a 
marking  series.  The  tokenized  simulation  marking  series 
corresponds  to  the  activity-level  time  series  predicted  for 
the  specified  experimental  condition.  The  differential  sim¬ 
ulation  marking  series  corresponds  to  the  change  in  activ¬ 
ity-levels  over  time  produced  by  switching  from 
experimental  condition  2  to  experimental  condition  1. 
The  marking  series  produced  by  a  simulation  can  be 
accessed  through  the  Marking  Manager.  Choosing  to 
inspect  a  marking  series  will  present  the  user  with  a  blank 
plot.  By  selecting  signaling  nodes,  the  plot  is  populated  by 
the  marking  series  values  for  individual  nodes  over  time, 
as  shown  in  Figure  8(c). 

While  this  plot  generation  capability  exists  in  many  other 
dynamic  simulation  tools,  the  simplicity  of  the  model 
used  for  simulation  and  the  speed  with  which  a  simula¬ 
tion  runs  set  PathwayOracle  apart  from  other  tools  which 
require  specification  of  the  numerical  values  of  kinetic 
parameters  for  each  reaction  in  the  network  of  interest 
(e.g.,  [20,22]).  PathwayOracle,  because  of  its  novel 
approach,  does  not  have  such  requirements.  It  is  worth 
noting,  however,  where  PathwayOracle  provides  approxi¬ 
mations  of  signal  flow,  an  ODE  generates  the  actual  con¬ 
centration  changes  using  extremely  detailed  and  accurate 
models  of  the  underlying  biochemistry.  The  simulators  in 
PathwayOracle  provide  an  attractive,  time-  and  resource¬ 
saving  alternative  this  more  exhaustively  parameterized 
techniques.  In  particular,  PathwayOracle' s  features  will 
benefit  researchers  interested  in  quickly  assessing  charac¬ 
teristics  of  signal  flow  in  their  network. 

For  some  networks,  biologists  will  have  partial  knowledge 
of  kinetic  parameters  or  of  other  biological  details  which 
the  signaling  Petri  net  model  does  not,  at  present,  con¬ 
sider.  By  integrating  this  knowledge  into  the  simulator,  it 
may  be  possible  to  improve  the  simulator's  predictions. 
We  identify  this  as  a  direction  for  future  investigation.  As 
the  signaling  Petri  net  simulator  is  extended,  these  new 
capabilities  will  be  incorporated  in  future  releases  of  Path¬ 
wayOracle. 

Signaling  Path  Anaiysis 

The  use  of  the  simulators  and  plotting  tools  allows  the 
user  to  observe  trends  in  the  activity-level  of  individual 
signaling  nodes  over  time.  Since  the  activity-level  of  a 
node  is  determined  by  the  activity-level  of  other  nodes  in 
the  network,  the  activity-level  time  series  of  a  node  may  be 


explained  by  changes  in  the  activity-level  history  of  nodes 
upstream  of  it.  In  order  to  investigate  such  indirect  inter¬ 
actions,  it  is  useful  to  enumerate  all  the  paths  leading 
from  a  specific  protein  to  the  protein  of  interest.  Pathway- 
Oracle  provides  this  capability.  Additionally,  it  provides 
various  statistics  on  the  set  of  paths  linking  two  signaling 
nodes  as  well  as  a  classification  of  the  effect  of  each  path 
as  either  coherent  or  incoherent  (e.g.  [23]).  A  coherent  path 
is  a  directed  series  of  interactions  that  leads  from  :v:  to  y 
such  that  an  increase  in  the  activity-level  of  x  causes  an 
increase  in  the  activity  of  y  and  a  decrease  in  the  activity- 
level  of  X  causes  a  decrease  in  the  activity-level  of  y.  An 
incoherent  path  is  a  directed  series  of  interactions  leading 
from  X  to  y  such  that  an  increase  in  the  activity-level  of  x 
causes  a  decrease  in  the  activity-level  of  y  and  a  decrease  in 
the  activity-level  of  x  causes  a  increase  in  the  activity-level 
of  y.  It  is  possible  to  classify  a  path  p  as  either  coherent  or 
incoherent  by  counting  the  number  of  inhibitory  edges 
along  p.  A  path  with  an  even  number  of  inhibitory  edges  is 
coherent;  a  path  with  an  odd  number  of  inhibitory  edges  is  inco¬ 
herent  [5].  This  logic  is  assumed  in  PathwayOracle.  All  sim¬ 
ple  paths  (paths  without  loops)  connecting  two  specified 
signaling  nodes  are  enumerated  by  an  exhaustive  depth- 
first  search.  These  paths  then  are  classified  as  either  coher¬ 
ent  or  incoherent,  and  presented  to  the  user  for  further 
inspection  in  a  window  similar  to  the  one  shown  in  Figure 
9(a).  When  a  path  is  selected  in  the  results  window,  it  is 
highlighted  in  the  main  window,  allowing  the  user  to 
evaluate  it  within  the  context  of  the  complete  network 
(see  Figure  9(b)). 

Experimental  Data  Analysis 

A  model  of  the  connectivity  of  a  signaling  network  makes 
it  possible  to  identify  components  of  the  model  that  are 
inconsistent  with  experimental  data  or  visa  versa.  Path¬ 
wayOracle  enables  this  kind  of  analysis  by  allowing  users 
to  load  experimental  concentration  data  and  visualize  it 
both  as  a  heatmap  (see  Figure  10(a))  or  superimposed  on 
the  network  view  (see  Figure  10(b)).  Several  other  soft¬ 
ware  tools  provide  similar  capabilities  (e.g.,  [21]).  \nPath- 
wayOracle,  experimental  concentration  data  is  loaded  as  a 
marking  group  in  which  a  single  marking  corresponds  to 
a  condition  for  which  concentrations  were  sampled.  Fig¬ 
ure  10(a)  shows  a  marking  group  with  24  conditions 
(rows).  The  concentration  of  seven  signaling  proteins 
were  sampled  for  each  condition.  This  is  the  heatmap 
view  for  the  marking  group.  When  a  specific  marking  in 
the  group  is  selected,  the  colors  for  that  marking  are 
applied  to  the  network  view.  This  is  particularly  useful 
when  assessing  whether  the  experimental  data  is  consist¬ 
ent  with  the  interactions  in  the  model.  In  Figure  10,  the 
MDA231-B-DMS02  marking  has  been  superimposed  on 
the  network.  We  can  see  that  RSK  has  a  relatively  low  con¬ 
centration  despite  the  high  concentration  of  MAPK.  Given 
that,  in  the  model,  RSK  is  activated  by  MAPK,  this  combi- 
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Figure  9 

The  path  interrogation  user  interface,  (a)  The  result  window  enumerating  the  set  of  all  paths  between  Ras  and  mTORIrap- 
tor.  (b)  The  main  network  view  showing  the  selected  path  highlighted. 


nation  of  activity-levels  seems  unlikely  to  occur.  Such  an 
inconsistency  suggests  that  there  may  be  other  signaling 
interactions  contributing  to  the  overall  activity-level  of 
RSK.  Such  an  insight  can  help  a  researcher  quickly  identify 
areas  where  the  model  or  experimental  results  need  to  be 
re-evaluated  or  improved. 

Future  Directions 

Our  goal  is  to  develop  PathwayOracle  into  an  integrated 
and  expansive  suite  of  tools  that  allow  the  biologist  to 
extract  as  much  information  as  possible  from  models  of 
signaling  network  connectivity  and  experimental  data 
relating  to  those  models.  We  consider  future  directions  for 
PathwayOracle  to  fall  into  several  categories:  network  con¬ 
struction,  network  augmentation,  experimental  and  com¬ 
putational  analysis  integration,  and  architecture. 

One  of  the  benefits  of  working  with  connectivity  models 
of  signaling  networks  is  the  abundance  of  databases  and 
other  online  resources  that  publish  connectivity-level 
data.  Future  versions  of  PathwayOracle  will  have  support 
for  querying  such  databases  for  connectivity  components 
and,  ultimately,  for  automated  connectivity  construction 
based  on  a  set  of  signaling  nodes  specified  by  the  user. 


Analysis  of  network  connectivity  and  topology  is  increas¬ 
ingly  relevant  to  biological  research.  We  intend  to  expand 
PathwayOracle's  structural  analysis  features  to  include  the 
ability  to  search  for  and  identify  motifs  in  the  signaling 
networks. 

Network  connectivity  can  also  be  inferred  from  experi¬ 
mental  data,  which  provides  another  direction  for 
research  and  development.  By  using  experimental  results 
to  identify  inconsistencies  between  experimental  results 
and  the  current  network  model,  it  may  be  possible  for 
PathwayOracle  to  augment  the  network  with  new  connec¬ 
tivity  based  on  hints  supplied  by  experimental  results.  At 
present  only  experimental  concentration  data  is  sup¬ 
ported.  However,  as  experiments  produce  more  informa¬ 
tion  beyond  concentration  profiles  of  signaling  nodes,  we 
plan  to  expand  the  experimental  data  that  PathwayOracle 
can  load,  visualize,  and  use  as  part  of  network  analyses. 

Experimental  results  can  also  provide  computational 
analysis  methods  information  that  can  improve  their  final 
predictions  or  decompositions.  Taking  advantage  of  the 
additional,  potentially  obfuscated,  information  present  in 
experimental  results  to  improve  the  results  returned  by 
computational  tools  is  a  major  goal  for  future  versions  of 
PathwayOracle. 
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Figure  10 

The  marking  group  user  interface,  (a)  The  heat  map  visualization  of  a  marking  group.  The  selected  marking,  MDA23 1  -B- 
DMSO I ,  is  highlighted  in  blue,  (b)  The  color  distribution  for  the  selected  marking  in  the  group  is  applied  to  the  network  view 
in  the  main  window.  Note  that  signaling  nodes  for  which  values  were  not  given  are  not  assigned  a  color  on  the  valid  red  to 
green  spectrum. 


A  longer  term  direction  for  PathwayOrade  is  the  integra¬ 
tion  of  transcriptional  and  metabolic  network  analysis.  In 
the  biological  systems  of  interest,  the  behavior  of  any  one 
of  these  networks  is  dependent  on  the  characteristics  of 
the  other  two.  As  a  result,  developing  a  complete  under¬ 
standing  of  signaling,  transcriptional  regulation,  or 
metabolism  depends  in  part  on  integrating  knowledge 
from  the  others.  Finally,  an  ongoing  priority  in  the  design 
of  PathwayOrade  is  its  role  as  an  open  platform  for  the 
development  and  deployment  of  new  analytical  capabili¬ 
ties  by  other  groups.  Currently  PathwayOrade  employes  a 
modular  architecture  that  facilitates  easy  integration  of 
new  functionality.  However,  in  future  releases  we  plan  to 
expose  a  plugin  interface  which  will  make  it  easier  to 
developers  and  researchers  to  develop  and  deploy  tools 
within  PathwayOrade. 


Conclusion 

PathwayOrade  is  an  integrated  software  environment  in 
which  biologists  may  conduct  structural  and  dynamic 
analysis  of  signaling  networks  of  interest.  PathwayOrade  is 
distinguished  from  other  tools  in  the  field  of  systems  biol¬ 
ogy  by  its  ability  to  predict  the  signal  flow  through  a  net¬ 
work  using  a  simplified,  connectivity-based  model  of  the 
signaling  network.  Simulations  are  fast  and,  based  on  a 
published  study,  predictors  of  signal  propagation.  This 
novel  simulation  capability,  combined  with  support  for 
structural  analysis  of  connectivity  between  pairs  of  pro¬ 
teins  and  for  analysis  of  certain  kinds  of  experimental  data 
make  PathwayOrade  a  powerful  asset  in  the  experimental¬ 
ist's  endeavor  to  gain  a  more  complete  understanding  of 
the  cellular  signaling  network. 
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Availability  and  requirements 

Project  name:  PathwayOracle 

Project  home  page:  http://bioinfo.cs.rice.edu/pathwayor 
acle 

Operating  system(s):  Platform  independent 
Programming  language:  Python 
Other  requirements:  Python  2.4  or  higher 
License:  GNU  GPL 

Any  restrictions  to  use  by  non-academics:  None 
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Abstract 

Extracting  network-based  functional  relationships  within  genomic  datasets  is  an  important  challenge  in  the  computational 
analysis  of  large-scale  data.  Although  many  methods,  both  public  and  commercial,  have  been  developed,  the  problem  of 
identifying  networks  of  interactions  that  are  most  relevant  to  the  given  input  data  still  remains  an  open  issue.  Here,  we  have 
leveraged  the  method  of  random  walks  on  graphs  as  a  powerful  platform  for  scoring  network  components  based  on 
simultaneous  assessment  of  the  experimental  data  as  well  as  local  network  connectivity.  Using  this  method,  NetWalk,  we 
can  calculate  distribution  of  Edge  Flux  values  associated  with  each  interaction  in  the  network,  which  reflects  the  relevance 
of  interactions  based  on  the  experimental  data.  We  show  that  network-based  analyses  of  genomic  data  are  simpler  and 
more  accurate  using  NetWalk  than  with  some  of  the  currently  employed  methods.  We  also  present  NetWalk  analysis  of 
microarray  gene  expression  data  from  MCF7  cells  exposed  to  different  doses  of  doxorubicin,  which  reveals  a  switch-like 
pattern  in  the  p53  regulated  network  in  cell  cycle  arrest  and  apoptosis.  Our  analyses  demonstrate  the  use  of  NetWalk  as  a 
valuable  tool  in  generating  high-confidence  hypotheses  from  high-content  genomic  data. 
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Introduction 

All  important  challenge  in  the  analyses  of  high  throughput 
datasets  is  integration  of  the  data  with  prior  knowledge 
interactions  of  the  measured  molecules  for  the  retrieval  of  most 
relevant  biomolecular  networks  [1-7].  This  approaeh  facilitates 
interpretation  of  the  data  within  the  context  of  known  functional 
interactions  between  biological  molecules  and  subsequently  leads 
to  high-confidence  hypothesis  generation.  Typically,  this  proce¬ 
dure  would  entail  identification  of  genes  with  highest  or  lowest 
data  values,  which  is  then  followed  by  identification  of  associated 
networks.  However,  retrieval  of  most  relevant  biological  networks/ 
pathways  associated  with  the  upper  or  lower  end  of  the  data 
distribution  is  not  a  trivial  task,  mainly  because  members  of  a 
biological  pathway  do  not  usually  have  similar  data  values  (e.g. 
gene  expression  change),  which  necessitates  the  use  of  various 
computational  algorithms  for  finding  such  networks  of  genes 
[1,2,4,5,8-1 1].  One  class  of  methods  for  finding  relevant  networks 
utilize  optimization  procedures  for  finding  highest-scoring  subnet¬ 
works/pathways  of  genes  based  on  the  data  values  of  genes  [2,8] . 
Although  this  approach  is  likely  to  result  in  highly  relevant 
networks,  it  is  computationally  expensive  and  ineffieient,  and  is 
therefore  not  suitable  for  routine  analyses  of  functional  genomics 
data  in  the  lab.  The  most  popular  of  the  existing  methods  of 
extraction  of  relevant  networks  from  genomic  data,  however, 
usually  involve  a  network  building  strategy  using  a  pre-defined 
focus  gene  set,  which  is  typically  a  set  of  genes  with  most  significant 


data  values  (e.g.  most  over-expressed  genes)  [1,7].  The  network  is 
buHt  by  “filling  in”  other  nodes  from  the  network  either  based  on 
the  enrichment  of  interactions  for  the  focus  set  (IPA  -Ingenuity 
Pathway  Analysis)  [1],  or  based  on  the  analysis  of  shortest  paths 
between  the  focus  genes  (MetaCore)  [7,12].  Both  methods  aim  at 
identifying  genes  in  the  network  that  are  most  central  to 
conneetmg  the  focus  genes  to  each  other.  Problems  associated 
with  these  methods  have  been  outlined  previously  [7].  However 
perhaps  most  importantly,  the  central  genes  identified  by  these 
methods  may  have  incoherent  data  values  with  the  focus  genes 
(e.g.  the  central  genes  may  have  reduced  expression  while  the 
focus  genes  may  have  increased  expression),  as  data  values  of 
nodes  are  not  accounted  for  during  the  network  construction 
process  using  the  seed  gene  list.  This  may  result  in  uninformative 
networks  that  are  not  representative  of  the  networks  most 
significantly  represented  in  the  genomic  data  (see  Results).  In 
addition,  these  methods  do  not  account  for  genes  with  more  subtle 
data  values  that  collectively  may  be  more  important  than  those 
with  more  obvious  data  values  [13].  Although  powerful  data 
analysis  methods  for  finding  sets  of  genes  with  significant,  albeit 
subtle,  expression  changes  have  been  developed  (e.g.  GSEA  [13], 
Molecular  concept  maps[14],  GenMAPP[15]),  such  an  approach 
has  not  been  incorporated  into  methods  for  extracting  interaction 
networks  that  are  most  highlighted  by  the  data. 

In  order  to  overcome  these  problems,  we  have  employed  the 
method  of  random  walks  in  graphs  for  scoring  the  relevance  of 
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Author  Summary 

Analysis  of  high-content  genomic  data  within  the  context 
of  known  networks  of  interactions  of  genes  can  lead  to  a 
better  understanding  of  the  underlying  biological  pro¬ 
cesses.  However,  finding  the  networks  of  interactions  that 
are  most  relevant  to  the  given  data  is  a  challenging  task. 
We  present  a  random  walk-based  algorithm,  NetWalk, 
which  integrates  genomic  data  with  networks  of  interac¬ 
tions  between  genes  to  score  the  relevance  of  each 
interaction  based  on  both  the  data  values  of  the  genes  as 
well  as  their  local  network  connectivity.  This  results  in  a 
distribution  of  Edge  Flux  values,  which  can  be  used  for 
dynamic  reconstruction  of  user-defined  networks.  Edge 
Flux  values  can  be  further  subjected  to  statistical  analyses 
such  as  clustering,  allowing  for  direct  numerical  compar¬ 
isons  of  context-specific  networks  between  different 
conditions.  To  test  NetWalk  performance,  we  carried  out 
microarray  gene  expression  analysis  of  MCF7  cells  subject¬ 
ed  to  lethal  and  sublethal  doses  of  a  DNA  damaging  agent. 
We  compared  NetWalk  to  other  network-based  analysis 
methods  and  found  that  NetWalk  was  superior  in 
identifying  coherently  altered  sub-networks  from  the 
genomic  data.  Using  NetWalk,  we  further  identified  p53- 
regulated  networks  that  are  differentially  involved  in  cell 
cycle  arrest  and  apoptosis,  which  we  experimentally 
tested. 


interactions  in  the  network  to  the  data.  The  method  of  random 
walks  has  been  well-established  for  structural  analyses  of  networks, 
as  it  can  fully  account  for  local  as  well  as  global  topological 
structure  within  the  network  [16,17]  and  it  is  very  useful  for 
identifying  most  important/central  nodes  [16-18].  Here,  instead 
of  working  with  a  pre-defmed  set  of  focus  genes,  we  overlay  the 
entire  data  distribution  onto  the  network,  and  bias  the  random 
walk  probabilities  based  on  the  data  values  associated  with  nodes. 
This  method,  NetWalk,  generates  a  distribution  of  Edge  Flux 
values  for  each  interaction  in  the  network,  which  then  can  be  used 
for  dynamical  network  building  or  further  statistical  analyses. 
Here,  we  describe  the  concept  of  NetWalk,  demonstrate  its 
usefulness  in  extracting  relevant  networks  compared  to  Ingenuity 
Pathway  Analysis,  and  show  the  use  of  NetWalk  results  in 
comparative  analyses  of  highlighted  networks  between  different 
conditions. 

We  tested  NetWalk  on  experimentally  derived  genomic  data 
from  breast  cancer  cells  treated  with  different  concentrations  of 
doxorubicin,  a  clinically  used  chemotherapeutic  agent.  Using 
NetWalk,  we  identify  several  previously  unreported  network 
processes  involved  in  doxorubicin-induced  cell  death.  From  these 
studies  we  propose  that  NetWalk  is  a  valuable  network  based 
analysis  tool  that  integrates  biological  high  throughput  data  with 
prior  knowledge  networks  to  define  sub-networks  of  genes  that  are 
modulated  in  a  biologically  meaningful  way.  Use  of  NetWalk  will 
greatly  facilitate  analysis  of  genomic  data. 


Pii  = 


(1) 


where  pij  is  the  transition  probability  from  node  i  to  node  J,  Wj  is  the 
experimental  value  for  node  j,  and  jVj  is  the  set  of  immediate 
downstream  neighbors  (undirected  edges  are  considered  bidirec¬ 
tional)  of  node  i.  If  there  are  no  downstream  nodes  of  the  node  i 
( I  jV) I  =G),  pij  is  set  lo p,j  =  1/  \  n\  for  ally  e  n  e  n,  where  n  is  the  set 
of  all  nodes  in  the  network.  The  relevance  score  of  each  node  in 
the  network  is  defined  by  the  probability  of  its  visitation  by  the 
random  walker,  which  is  a  function  of  both  the  local  network 
eonnectivity  as  well  the  data  values  of  nodes.  So  at  any  step  k  of 
this  “random  walk”  process,  the  probability  of  a  node  being  visited 
by  the  random  walker  is 


&i  Pji  (2) 

where  gf  is  the  probability  of  node  i  at  step  k,  pji  is  the  transition 
probability  from  node  j  to  node  i  and  jV  is  the  set  of  interacting 
neighbors  of  node  i.  This  can  be  represented  in  a  matrix  form 

g''  =  g>^->-P  (3) 

where  is  the  vector  of  probability  values  for  all  nodes  in  the 
network  at  step  k,  and  P  is  the  transition  probability  matrix  of  the 
network.  Obviously,  since  a  “walk”  can  only  be  performed  over 
adjacent  nodes,  p,jX)  only  if  nodes  i  and  j  directly  interact.  The 
expression  above  can  also  be  written  as 

gk  =  g0.pk  (4) 

where  is  the  transition  probability  matrix  raised  to  the  power  k, 
and  g”  is  the  initial  probability  distribution  over  nodes  (all  1  /  |  n  | ). 
By  the  Perron-Frobenius  theorem  for  stochastic  matrices,  as 
k—>ook—*co  (infinite  random  walk),  the  expression  above 
converges  to 


g  =  g-P  (5) 

where  g  is  the  left  eigenvector  of  P  associated  with  eigenvalue  1 
and  contains  the  final  visitation  probability  values  of  nodes. 

The  final  visitation  probabilities  of  nodes  depend  on  their  data 
values,  data  values  of  their  neighbors,  as  well  as  the  loeal  network 
connectivity.  In  order  to  further  bias  the  random  walk  towards  the 
input  data  values,  we  assigned  a  small  probability  q  that  the 
random  walker  will  return  to  its  starting  node.  Therefore,  the 
expression  for  random  walk  with  restart  is  given  by 

g  =  g(^(l-9)P+ (6) 


Methods 

Calculating  node  probabilities  using  data 

Integration  of  genomic  data  represented  by  a  vector  w  with  the 
network  data  of  interactions  between  genes  (nodes)  is  performed 
by  representing  each  interaction  (edge)  in  the  network  in  the  form 
of  a  transition  probability  based  on  the  data  values  (e.g.  mRNA 
expression  change,  phenotype  score  from  a  genetie  screen)  of 
nodes  within  the  immediate  neighborhood: 


where  q  is  a  vector  of  all  q  of  length  |  n  \  and  1  is  a  vector  of  all  1 : 
so  that  the  restart  probability  is  uniform  among  all  nodes. 
However,  we  bias  the  restart  probabilities  to  the  data  values  of 
nodes,  so  that  the  random  walker  is  more  likely  to  return  to  its 
initial  node  if  the  data  value  of  that  node  is  high. 

g  =  g(^(l-g)P+ 
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In  this  way,  the  probability  that  the  random  walker  will  restart  at 
another  node  i  is  direcdy  proportional  on  the  data  value  of  node  i, 
thereby  even  more  biasing  the  process  of  random  walk  to  the 
biological  data. 


EFy=\og\^\ 


(12) 


Calculating  node  probabilities  for  transcription  factors 

In  the  case  of  transcription  factor  -  target  gene  interactions, 
these  were  reversed  in  the  network  so  that  the  node  values  of  target 
genes  would  contribute  to  the  probabilities  of  the  transcription 
factors,  rather  than  the  other  way  around.  This  is  because  the  data 
values  of  target  genes  (i.e.  mRNA  expression  change)  are  more 
informative  of  identifying  regulation  by  transcription  factors. 

Calculating  edge  flux  values 

To  find  networks  of  interactions  between  genes  represented  in 
the  data,  we  scored  each  interaction  in  the  network  by 

eij  =  g,Pij  (8) 

where  is  the  flux  through  edge  ij  and  represents  the  score  of 
importance  of  the  interaction  based  on  the  data. 

Controlling  for  topological  bias  in  the  network 

The  node  visitation  frequencies  in  a  random  walk  direcdy  reflect 
the  relative  centralities  of  nodes  in  the  network,  and  therefore  are 
highly  biased  towards  the  local  network  topology.  Although 
biasing  the  random  walk  to  data  values  skews  the  visitation 
frequencies  towards  the  supplied  data  values,  there  is  stiU  a 
significandy  high  correlation  with  node  connectivity  values  (Figure 
SI),  which  suggests  that  the  random  walk  process  is  highly  biased 
to  the  highly  connected  hubs  in  the  network.  Therefore,  it  is 
important  to  eontrol  for  topological  bias  in  the  network  that  stems 
either  from  its  scale-free  nature  or  the  historical  bias  of  highly 
studied  genes.  In  order  to  control  for  topological  biases  in  the 
network,  we  also  calculated  background  visitation  frequencies 

gr  =  gr(^(l-9)Pr+^qxl’^j  (9) 

which  is  the  same  expression  as  in  equation  (7),  with  the  exception 
that  erjj=g^jPr_ij.  Pr  is  a  transition  probability  matrix  formed 
by  letting  w,  =  1  for  all  i.  Since  g,  is  ealculated  without  considering 
the  data  values  of  genes,  it  contains  all  the  topological  bias  in  the 
network.  Therefore,  to  obtain  relative  visitation  frequencies  of 
genes  (g'),  we  normalize  values  in  g  by  those  in  g,, 

g'i=f-  (10) 

Sri 

Relative  visitation  frequency  values  in  g'  have  minimal  eorrelation 
with  node  eentralities,  and  have  a  high  correlation  with  the 
supplied  gene  expression  measurements  (Figure  S2),  which 
indicates  that  relative  visitation  frequencies  of  nodes  are  highly 
biased  towards  the  data. 

Normalization  of  edge  flux  values  is  done  by  first  calculating 

erjj=grjPrjj  (H) 

where  e,  is  the  edge  score  distribution  vector  calculated  by  letting 
Wi=  1  for  all  i.  Then,  we  normalize  the  data-biased  edge  flux 
values  to  e,  to  obtain  normalized  Edge  Flux  of  interaction  ij 
{EFij)  EFy) 


which  gives  the  final  normalized  score  distribution  of  edges,  which 
reflects  edge  fluxes  of  nodes  relative  to  what  would  be  expeeted  by 
topology  alone  in  the  given  network. 

Data  format  and  missing  values 

Because  of  the  nature  of  random  walks  described  above,  the 
input  values  must  be  positive,  possibly  representing  ratio  of  a  test 
versus  control  sample  (e.g.  ratio  of  mRNA  expression  levels  of 
treated  to  untreated  samples).  Missing  values  in  the  network  are 
then  assigned  a  value  of  1 ,  which  represents  a  no  change  case  in  ratio 
values.  Accordingly,  the  values  of  s  are  centered  around  0,  with 
higher  values  meaning  higher  probability  relative  to  what  would 
be  expected  by  ehance  in  the  given  network  (i.e.  networks  of  high 
data  value  nodes,  e.g.  increased  gene  expression),  and  lower  values 
meaning  lower  visitation  probability  (i.e.  networks  with  low  data 
values,  e.g.  reduced  gene  expression)  (see  below). 

Effect  of  data  distribution  on  Edge  Flux  values 

In  order  to  prevent  disproportionate  skewing  of  the  node 
probabilities  with  extreme  outliers  in  the  data,  the  input  data  is 
normalized  so  that  all  w>k()  ggg  are  assigned  kg  ggg,  where  kg  ggg  is 
the  99.9th  percentile  value  of  w.  Similarly,  all  te)<kgggi  are 
assigned  kg  ggi.  With  this  proeedure,  the  final  normalized  visitation 
frequencies  of  nodes  are  highly  robust  to  differences  in  data 
distributions  and  ranges  (see  Figure  S3). 

Network  construction 

We  compiled  protein-protein  interactions  from  online  databases 
HPRD  [19]  BIND  [20],  HomoMINT  [21],  Gene  [22]  and  IntAct 
[23].  For  directed  interactions,  we  compiled  signaling  interactions 
from  KEGG  [24],  BioCarta  (http://pid.nci.mh.gov/)  and 
TRANSPATH  [25],  as  well  as  through  manual  curation  of  the 
undirected  interactions  based  on  published  literature.  Transcrip¬ 
tion  factor-target  interactions  were  obtained  from  ORegAnno  [26] 
and  TRANSFAC  [27]  databases.  This  resulted  in  a  network  of 
10,473  genes  connected  by  ~65,000  interactions. 

In  network-based  analyses  of  genomic  data,  the  analyses  and 
therefore  resultant  hypotheses  are  limited  by  the  gene  coverage  of 
the  network.  Therefore,  it  is  crucial  that  the  interaction  network 
has  as  much  gene  coverage  as  possible.  Since  our  main  goal  of 
network-based  analyses  is  identification  of  relevant  biologieal 
processes,  the  interactions  represented  in  the  network  need  not  be 
direct  physical  interactions.  For  example,  a  coneordant  increase  in 
the  expression  of  genes  involved  in  glucose  metabolism  will  not  be 
captured  in  network-based  analyses  of  direct  physical  interactions, 
as  metabolic  enzymes  within  the  same  pathway  rarely  engage  in 
direct  physical  interactions  (with  the  exception  of  multifunctional 
complexes).  Therefore,  inclusion  of  indirect  functional  interactions 
in  the  network  may  help  identify  relevant  biological  processes  that 
are  not  eaptured  by  direct  interactions  (see  network  plots  below). 
In  order  to  increase  the  coverage  of  our  network,  we  added 
functional  similarity  interactions  between  genes,  where  an 
interaction  means  that  the  genes  are  involved  in  similar  functional 
processes,  such  as  a  metabolic  pathway  (e.g.  glycolysis)  or  a  specific 
enzymatic  reaction  (e.g.  oxidation/reduction).  Functional  similar¬ 
ity  interactions  were  constructed  using  Gene  Ontology  (GO) 
annotations  [28]  as  defined  in  the  Entrez  Gene  database,  and  also 
metabolic  pathway  annotations  in  the  KEGG  database.  Any  two 
genes  sharing  a  metabolic  pathway  annotation  (but  not  signaling 
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pathways  as  they  are  already  represented  in  protein-protein 
interactions)  from  KEGG  were  assigned  an  interaction.  In  the  case 
of  GO  annotations,  two  genes  were  assigned  an  interaction  if  the 
overlap  of  their  GO  annotations  was  significant  compared  to  the 
rest  of  the  genes: 

„  irifev  Gk\ 

n 

where  s,j  is  the  significance  of  overlap  between  genes  i  and  7;  G;,  is 
the  set  of  genes  that  have  the  GO  term  t,  jV  is  the  set  of  GO  terms 
common  to  genes  i  andj,  and  n  is  the  total  number  of  genes.  If 
jj<0.001,  genes  i  andj  were  assigned  an  interaction. 

Our  final  network  contains  14,506  genes  connected  by  189,901 
interactions.  Gene  coverage  of  our  network  of  genes  in  our 
doxorubicin  dataset  is  comparable  to  that  in  the  Ingenuity 
Pathway  Analysis  (13,329  in  our  network  versus  13,880  in 
IPA). 

Microarray  analyses 

MCF7  cells  were  grown  in  DMEM  (Iiivitrogen)  supplemented 
with  10%  FBS  (Gemini)  to  near  confluency  and  treated  with  1  or 
10  pM  Doxorubicin  (Sigma).  Cells  were  collected  at  0,  6,  12  and 
24  hours  post-treatment.  Cell  lysis  and  RNA  extraction  was  done 
using  Mirvana  miRNA  isolation  kit  (Ambion)  and  amplification 
using  Illumina  TotalPrep  RNA  amplification  kit  (Ambion).  Equal 
amount  of  RNA  from  each  sample  was  hybridized  to  Illumina 
HT12  BeadChip  (Illumina).  AH  procedures  were  performed 
exactly  as  described  in  the  respective  manuals.  The  experiments 
were  repeated  in  triplicate. 


Analyses  with  IPA 

Networks  in  IPA  were  generated  using  Core  analysis  with 
indicated  data  cutoffs  for  upregulated  genes  and  using  direct 
interactions  with  the  cutoff  for  network  size  to  be  70.  Highest 
scoring  5  networks  were  merged  and  exported  as  text  files. 

Network  plotting 

All  network  plottings  were  done  using  the  gplot  function  in  the 
sm  package  for  R  (http://erzuli.ss.uci.edu/R.stuff/). 

Western  blotting 

Cells  were  treated  as  indicated  and  lysed  in  a  sample  lysis  buffer 
(50  mM  Hepes,  150  mM  NaCl,  ImM  EGTA,  10  mM  Sodium 
Pyrophosphate,  pH  7.4,  100  nM  NaF,  1.5  mM  MgCl2,  10% 
glycerol,  1%  Triton  X-100  plus  protease  inhibitors;  aprotinin, 
bestatiii,  leupeptin,  E-64,  and  pepstatin  A).  Blotting  was  done 
using  antibodies  against  p53  (Cell  Signaling),  p21  (Cell  Signaling) 
and  Actiii  (Sigma).  The  experiment  was  done  in  triplicate. 

Apoptosis  assays 

FACS;  Cells  were  treated  as  indicated  and  after  24  hours 
trypsmized,  frxed  with  70%  ethanol  at  — 20°C  for  10  minutes  and 
resuspended  in  Propidium  Iodide  solution.  FACS  analysis  was 
performed  in  the  Flow  Cytometry  core  facility  of  M.D.  Anderson 
Cancer  Center. 

Rhodamine  123  assay:  Rhodamine  123  staining  was  performed 
as  described  [29].  Briefly,  cells  were  treated  as  indicated  and  after 
24  hours,  trypsinized,  spun  down  and  resuspended  in  10  pM 
Rhodamine  123  (Invitrogen)  in  PBS  for  30  minutes.  Cells  were 
washed  in  PBS  and  analyzed  by  FACS  for  Rhodamine  123 
intensity  (green). 


Figure  1.  General  concept  of  NetWalk.  An  imaginary  network  with  artificial  experimental  data  values  is  shown  (e.g.  relative  gene  expression 
values)  on  the  left.  Node  A  was  assigned  a  value  of  5,  nodes  G,  H,  I,  J,  K  and  L  were  assigned  2,  and  all  the  other  nodes  were  assigned  1.  A  transition 
probability  matrix  P  was  constructed  using  the  input  data  values  and  the  network,  with  transition  probabilities  between  adjacent  nodes  reflecting 
their  data  values  (colors  in  the  matrix  reflect  transition  probabilities  P(i^j)  according  to  the  color  key).  Final  visitation  and  flux  values  reflect  the  level 
of  coherence  between  the  experimental  data  of  genes  and  their  relative  positioning  within  the  network.  Note  that  node  colorings  in  the  network  on 
the  right  reflect  relative  visitation  probabilities  of  nodes,  and  line  colors  of  edges  reflect  the  flux  values  according  to  the  same  color  scale. 
doi:1 0.1 371  /Journal.pcbi.1 000889.g001 
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Results 

Generating  networks  with  NetWalk 

Identifying  common  biological  roles  of  genes  whose  expression 
are  altered  in  a  microarray  experiment  is  one  of  the  most 
frequently  used  strategies  to  understand  the  underlying  biological 
processes  and  derive  hypotheses  [6,13-15,30],  This  strategy  is  also 
implicit  in  NetWalk  (Figure  1),  as  node  visitation  frequency  values 
(hence  EF  values)  calculated  by  NetWalk  are  based  on  1)  data 
values  of  nodes,  2)  data  values  of  their  network  neighbors  and  3) 
the  network  connectivity  among  neighbors.  Therefore,  a  node 
with  a  high  data  value  that  interacts  with  other  nodes  with  high 
data  values  in  the  network  will  receive  the  highest  node  visitation 
and  EF  scores.  Similarly,  a  node  with  a  low  data  value  that 
interacts  with  other  nodes  with  low  data  values  in  the  network  will 
receive  the  lowest  node  visitation  and  EF  scores. 


In  order  to  test  the  dependency  of  NetWalk  output  on  the 
provided  data,  we  performed  deletions  of  portions  of  data  and 
compared  the  resultant  visitation  frequencies  to  those  of  the 
original  dataset.  Correlation  of  node  visitation  frequencies  to  those 
of  the  full  dataset  closely  followed  the  input  data,  suggesting  that 
NetWalk  output  is  highly  dependent  on  the  supplied  data  (Figure 
S4).  However,  this  may  also  suggest  that  NetWalk  output  is  mosdy 
independent  of  the  network  connectivity.  In  order  to  test  the 
dependence  of  NetWalk  output  on  the  network  connectivity,  we 
removed  parts  of  the  network  and  performed  NetWalk  analysis  on 
the  perturbed  networks.  The  resultant  node  visitation  frequencies 
correlate  relatively  poorly  with  those  of  the  original  network 
(Figure  S5),  indicating  that  the  network  connectivity  substantially 
contributes  to  node  visitation  frequency  values.  We  also  performed 
a  similar  analysis  with  random  deletions  and  additions  of  edges, 
rather  than  nodes,  in  the  network,  and  found  a  similar  dependence 
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Figure  2.  NetWalk  analysis  of  low  and  high-dose  doxorubicin  response  in  MCF7  celis.  A)  Apoptosis  levels  In  MCF7  cells  after  24  hours  of 
stimulation  with  indicated  doses  of  doxorubicin  as  measured  by  FACS  analysis  of  DNA  content  (see  Methods).  B)  FACS  analysis  of  viable  cells  as 
indicated  by  loss  of  Rhodamine  1 23  staining(see  Methods).  C-D)  Plots  of  interactions  with  lowest(B)  and  highest  (C)  EF  values  in  samples  treated  with 
1  pM  doxorubicin  for  24  hours  relative  to  control.  Nodes  are  colored  according  to  their  gene  expression  change  relative  to  control  according  to  the 
color  key.  Edge  coloring  reflects  type  of  interaction,  PPI:  protein-protein  interaction,  TF-target:  gene  regulation,  FS:  functional  similarity.  The 
distribution  plot  of  all  EF  values  is  shows  at  the  bottom. 
doi:1 0.1 371/journal.pcbi.1 000889.g002 
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of  the  NetWalk  output  on  the  network  connectivity  (Figure  S7). 
These  analyses  demonstrate  that  NetWalk  output  is  highly 
dependent  on  both  the  supplied  data  as  well  as  the  network 
information. 

To  demonstrate  the  use  of  NetWalk  in  the  extraction  of  relevant 
networks  out  of  microarray  gene  expression  data,  we  studied  gene 
expression  profiles  of  MCF7  cells  subjected  to  sub-lethal  and  lethal 
doses  of  doxorubicin.  We  performed  microarray  gene  expression 
analysis  ofMCF7  cells  before  and  after  treatment  with  1  or  10  pM 
doxorubicin  for  6,  12  and  24  hours.  In  these  cells,  1  pM 
doxorubicin  causes  a  cell  cycle  arrest  in  S-phase,  whUe  a  10  pM 
dose  induces  cell  death  (Figure  2A-B).  A  NetWalk  analysis  of  the 
ratio  values  (treated/untreated)  for  1  pM  treatment  was  per¬ 
formed  using  q  =  0.01  (see  Methods).  The  resulting  distribution  of 
edge  flux  values,  and  plots  of  edges  with  100  highest  and  lowest  EF 
values  can  be  seen  in  Figure  2C-D.  FT  values  are  stricdy  biased 
towards  the  data,  as  the  high  and  low-end  networks  are  entirely 
composed  of  genes  with,  respectively,  increased  and  reduced 
expression  levels.  In  the  Figure  2D,  interactions  in  the  cluster 
made  of  GLS,  GLS2,  P4HA2,  ODCl  and  PRODH  genes 
(arginine  and  proline  metabolism)  have  the  highest  EF  scores 
due  to  both  their  high  data  values  and  tight  interconnections  with 
each  other.  Similarly,  in  the  low-score  network  in  Figure  2C, 
interactions  in  the  cluster  containing  NDC80,  CENPK,  CBXl, 
CENPA  and  SGOLl  (centriole  components)  have  the  lowest  EF 
scores.  Nodes  with  moderate  values  that  are  in  close  proximity  to 
other  high  value  nodes  within  a  tightly  connected  neighborhood 
will  also  get  high  scores,  as  is  seen  with  TP53  in  Figure  2B. 


In  order  to  demonstrate  that  the  p53  network  extracted  by 
NetWalk  is  not  an  artifact  of  highly  connected  subnetworks,  we 
performed  a  NetWalk  analysis  of  baseline  expression  profile  of 
MCF7  cells  relative  to  other  breast  cancer  cells  as  reported  by 
Neve  et  al  [31].  The  most  significantly  upregulated  networks  in 
MCF7  cells  relative  to  the  rest  of  53  breast  cancer  cells  are  those 
involved  in  the  Estrogen  Receptor  signaling  (Figure  S6),  a  well- 
characterized  dominant  pathway  in  the  estrogen  receptor  positive 
MCF7  cells.  This  analysis  shows  that  NetWalk  output  does  indeed 
reflect  accurate  quantification  of  highly  biologically  relevant 
networks  based  on  the  supplied  data. 

EF  scores  are  highly  coherent  with  data  values 

Contrary  to  the  seed-based  network  building  methods,  NetWalk 
works  with  the  whole  data  distribution  and  so  does  not  require 
assignment  of  pre-defined  cutoffs  or  focus  gene  sets.  NetWalk 
procedure  simply  translates  the  gene-centric  data  values  to 
corresponding  interaction  scores  based  on  the  coherence  of  the 
gene  values  with  those  in  the  local  network  neighborhood  as  well  as 
the  local  interaction  pattern  in  the  network.  Therefore,  the  results 
can  be  viewed  at  any  user  defined  cutoff  value  for  flexible  generation 
of  networks  with  highly  coherent  node  values.  The  distribution  of 
input  node  values  and  sample  networks  with  different  EF  cutoffs 
shows  that  the  node  values  within  networks  are  highly  coherent 
across  a  wide  range  of  EF  score  cutoffs,  which  allows  for  high- 
confidence  hypothesis  generation  about  activated  and  inactivated 
network  processes  in  response  to  DNA  damage  (Figure  3A-B).  In 
comparison,  the  distribution  of  data  values  of  nodes  in  the  networks 


A 


NetWalk 


Ingenuity  Pathway  Analysis  ^ 


PPI  - TF-target  FS 


Log2 

ratio 


3. 

_ 

3- 

2  . 

2- 

1  - 

te  ^9 

1  - 

0 . 

“  81 

0- 

164 

359 

-2. 

-2- 

-3. 

-3. 

T,-  T,  ''  T,- 


<>• 


EF  cutoff 


Data  cutoff  for  focus  genes 


NetWalk 


EF  cutoff 


Ingenuity  Pathway  Analysis 

3.5 

0.5 
0.15 

-0.1 

-2.5 

Data  cutoff  for  focus  genes 


^SPZ  LCPl  ^ 

M  IGFBP5  #  L 

0^TRA1  ^  \  • 

:AM46A  ^  f|„„,  = 


Log2  ratio 


Figure  3.  Comparison  of  coherence  of  node  vaiues  in  highest  scoring  networks.  A)  Boxplots  of  gene  expression  change  values  (1  pM  DOX, 
24  hours  relative  to  control)  of  nodes  in  networks  generated  by  different  cutoffs  of  EF  values,  or  in  networks  generated  by  Ingenuity  Pathway 
Analysis  software  using  different  gene  expression  value  cutoffs  for  the  focus  gene  set  (see  Methods).  B)  Fleatmaps  showing  position  of  genes  in  the 
networks  in  A  in  the  whole  data  distribution.  Positions  of  genes  in  the  respective  networks  are  indicated  by  a  white  line.  C)  A  network  of  nodes 
generated  by  Ingenuity  Pathway  Analysis  software  with  focus  gene  set  using  1.5  as  cutoff.  Since  original  network  plots  in  IPA  lack  node  colorings  for 
intermediate  genes  (non-focus  genes),  we  extracted  all  nodes  in  the  IPA-generated  network  and  re-plotted  them  using  our  network,  where  we 
colored  all  nodes  by  their  gene  expression  change. 
doi:1 0.1 371/joumal.pcbi.1 000889.g003 
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generated  by  Ingenuity  Pathway  Analysis,  which  takes  a  focus  gene 
list  as  input  to  build  relevant  networks,  includes  nodes  with 
incoherent  data  values  (see  Figure  3A-C),  which  reduces  confidence 
in  the  relevance  of  the  generated  networks  to  the  data.  The  network 
of  124  genes  retrieved  by  IPA  using  a  cutolf  of  >  1.5  (60  focus  genes) 
contains  many  genes  witli  reduced  expression  values  (Figure  3C), 
which  were  included  in  the  network  by  the  virtue  of  their 
connectivities  but  not  data  values.  Consequently,  the  resulting 
network  is  not  entirely  representative  of  upregulated  network 
processes  in  response  to  doxorubicin.  Moreover,  none  of  the 
networks  identified  by  IPA  contain  all  the  genes  involved  in 
arginine-proline  metabolism  (compare  Figures  2D  and  3C)  or  any 
genes  involved  in  the  nucleotide  metabolism  that  were  retrieved  by 
NetWalk  (see  eluster  in  Figure  2D  containing  RRM2B,  AKl, 
POLR2A  and  NME2;  compare  with  Figure  3C),  demonstrating 
inability  of  seed-based  methods  to  identify  subnetworks  witli  more 
subtle  yet  coherent  gene  expression  values. 

Statistical  analyses  using  NetWalk  output  to  elucidate 
p53-nnediated  response  to  DNA  damage 

As  stated  earlier,  an  important  feature  of  NetWalk  is  that  the  result 
is  not  a  single  or  a  collection  of  static  networks,  but  a  whole  distribution 


of  numerical  edge  scores.  In  addition  to  their  use  for  dynamical 
network  construction  of  dilferent  sizes  based  on  the  user  preference, 
these  can  be  further  subjected  to  standard  statistical  tests  for  a  more 
detailed  analysis.  The  heatmap  of  interactions  with  highest  and  lowest 
EF  scores  in  each  condition  in  our  microarray  dataset  is  shown  in 
Figure  4A.  As  opposed  to  clustering  with  traditional  heatmaps  of  gene 
expression  values  where  cluster  membership  of  genes  is  exclusive, 
here,  a  gene  can  appear  in  several  dilferent  clusters  but  aU  with 
different  interactions.  So,  analysis  of  expression  with  iiF  scores  enables 
studying  specific  functions  (i.e.  interactions)  of  genes  rather  than  their 
individual  expression  values.  The  heatmap  shows  that  the  activation 
and/ or  inactivation  of  several  networks  is  specific  to  low-  or  high-dose 
doxorubicin  treatment.  The  cluster  K3,  for  example,  is  activated  in 
response  to  high-dose  doxorubicin,  while  K4  is  more  specifically 
activated  in  response  low-dose  doxorubicin.  A  plot  of  interactions  in 
K3  reveals  several  metabolic  pathways  specifically  activated  in  the 
high-dose  treatment,  including  glycolysis,  acetyl  coenzyme  A  synthesis, 
arginine /proHne  metabolism  and  the  mitochondrial  electron  transport 
chain  (Figure  4G).  There  is  also  a  p53-centered  subnetwork  containing 
several  previously  identified  p53  target  genes.  The  plot  of  interactions 
in  K4  shows  an  extensive  p53-centered  network  composed  mostly  of 
cell  cycle  regulatory  proteins  (e.g.  CDKNIA  (p21CIP)  and  several 
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Figure  4.  Clustering  analysis  of  EF  values  in  each  condition.  A)  Heatmap  of  highest  and  lowest  EF  values  In  each  condition.  Clustering  was 
done  using  Ward's  method  in  R.  B-C)  Networks  corresponding  to  K3  (B)  and  K4  (C).  Node  colorings  are  according  to  24h  of  1  and  10  pM  DOX 
treatments,  respectively.  Edge  colorings  are  as  in  Figure  2C. 
doi:10.1371/journal.pcbi.1000889.g004 
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GADD45  genes)  (Figure  4B).  Interestingly,  although  p53  appears  in 
both  K3  and  K4,  its  functions  seem  to  be  completely  dUferent  in  the 
low  and  high  dose  treatments.  In  response  to  low-dose  doxorubicin, 
p53  is  involved  in  tire  activation  of  cell  cycle  regulatory  proteins,  while 
under  high-dose,  it  activates  other  targets,  such  as  TMSB4X. 
Moreover,  p53-target  genes  in  cell  cycle  regulation  in  K3  are 
inactivated  in  high-dose  doxorubicin  (Figure  5A-B),  which  we 
confirmed  by  western  blotting  (Figure  5C),  suggesting  that  p53  may 
act  as  a  transcriptional  activator  of  these  genes  during  ceU  cycle  arrest 
but  as  a  repressor  during  apoptosis.  This  trend  suggests  not  only  that 
p53  may  engage  different  targets  during  ceU  cycle  arrest  and  apoptosis, 
but  also  shows  dual  behavior  of  p53  under  these  conditions.  In 
addition,  this  analysis  shows  that  energy  and  amino  acid  metabolisms 
may  play  an  important  role  in  doxorubicin-induced  ceU  death.  Here, 
clustering  analysis  using  NetWalk  results  facilitated  comparison  of 
networks,  rather  than  genes,  between  different  conditions,  leading  to 
the  identification  of  differential  activities  of  p53  under  low  and  high- 
dose  doxorubicin  treatment. 

Discussion 

NetWalk  algorithm 

Analyses  of  high  content  data  within  die  context  of  biological 
interactions  aUow  for  high  confidence  hypothesis  generation  about 
mechanisms  involved  in  the  studied  process.  While  some  work  has 


been  done  on  inferring  novel  causal  interactions  out  of  data  [32- 
34],  the  most  popular  method  is  integration  of  data  with  prior 
knowledge  on  interactions  to  extract  most  relevant  networks 
highlighted  by  the  data.  Most  of  the  methods  for  extracting 
relevant  networks  rely  on  finding  genes  in  the  network  that  are 
most  central  to  connecting  the  genes  of  interest  identified  from  the 
data.  The  random  walk  process  in  NetWalk  also  scores  most 
central  genes  in  the  network.  However,  rather  than  working  on  a 
small  set  of  focus  genes,  NetWalk  scores  centralities  of  all  genes  in 
the  network  based  on  the  whole  data  distribution.  This  is  achieved 
by  biasing  the  random  walk  transition  probabilities  between  genes 
to  their  corresponding  data  values,  which  allows  for  higher 
visitation  probabilities  of  nodes  with  high  data  values  and  lower 
probabilities  of  nodes  with  low  data  values.  Since  visitation 
probabilities  of  nodes  in  a  random  walk  are  also  dependent  on  the 
visitation  probabilities  of  their  network  neighbors,  nodes  with 
relatively  moderate  data  values  associated  with  those  with  higher 
values  have  the  potential  of  high  visitation  by  the  random  walk. 
Therefore,  NetWalk  scores  nodes  based  on  their  data  values,  data 
values  of  their  neighbors  and  local  network  connectivity. 

Unlike  most  of  the  existing  methods  for  network  extraction, 
which  typically  give  a  set  of  networks  as  outputs  [1,9],  NetWalk 
gives  a  distribution  of  EF  values  that  allows  for  flexibility  in 
network  construction  using  different  EF  cutoffs.  In  addition,  EF 
scores  can  be  subjected  to  further  statistical  tests  for  comparative 
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Figure  5.  p53-target  cell  cycle  regulatory  genes  are  specifically  repressed  during  apoptosis.  A-B)  Network  plot  of  Interactions  In  K3  (see 
Figure  4)  related  to  cell  cycle  regulation.  Nodes  colored  according  to  gene  expression  changes  at  10  (A)  or  1  pM  (B)  doxorubicin  treatment.  C) 
Western  blots  of  p53,  p21  (CDKN1A  gene  product)  protein  levels  over  a  time  course  after  1  and  lOpM  doxorubicin  treatment.  Actin  levels  shown  as 
control. 
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studies,  allowing  for  network-based  comparisons  of  multiple 
conditions. 

Another  important  feature  of  NetWalk  is  its  computational 
efficiency.  We  implemented  a  sparse  matrix  representation  and 
multiplication,  which  allows  for  NetWalk  to  be  run  on  a  standard 
PC  equipped  with  1  gigabytes  of  memory.  In  our  case  (PC  with 
Intel  Xeon  Quad  processor),  NetWalk  run  of  a  single  dataset  in 
our  network  (14,506  nodes  and  ~  190,000  interactions)  took  about 
2-3  seconds. 

NetWalk  analysis  of  the  experimental  data  revealed  a  significant 
activation  of  networks  involved  in  energy  metabolism,  including 
the  glycolytic  and  mitochondrial  electron  transport  chain  compo¬ 
nents.  At  least  one  member  of  the  electron  transport  chain, 
SC02A,  has  been  previously  shown  to  be  a  p53  target  [35], 
suggesting  that  some,  if  not  most,  of  the  metabolic  genes  activated 
in  response  to  10  uM  doxorubicin  may  be  p53  target  genes.  A 
specific  and  extensive  activation  of  the  energy  metabolism  during 
p53-mediated  apoptosis  has  not  been  previously  reported,  and 
therefore  it  is  a  novel  finding  facilitated  by  NetWalk  analysis. 
Network  analysis  of  experimental  data  using  NetWalk  revealed 
dual  behavior  of  p53  under  sublethal  and  lethal  doses  of  DNA 
damage.  In  response  to  sublethal  doses  of  DNA  damaging  agents, 
p53  activates  a  cell  cycle  arrest  program  centered  around  CDK 
inhibitors  p21  (CDKNIA)  and  GADD45,  as  well  as  several  pro- 
apoptotic  genes,  such  as  BAX  and  APAF 1 .  However  under  lethal 
doses,  p53  represses  the  cell  cycle  arrest  machinery  and  activates 
an  entirely  different  program.  Use  of  NetWalk  analysis  allows 
network  based  analysis  of  genomic  data  as  well  as  high  confidence 
hypothesis  generation  and  is  a  valuable  tool  in  post-genomic 
anlaysis. 

Supporting  Information 

Figure  SI  Correlation  of  node  visitation  frequencies  with  node 
connectivities  (left)  and  original  data  values  (right)  before 
normalization  for  network  topology  (see  Text).  R2  values  show 
squared  Spearman’s  rank  correlation  coefficients. 

Found  at:  doi:  10.1 37 1/journal.pcbi.  1000889. sOOl  (0.08  MB  PDF) 

Figure  S2  Same  as  in  Figure  SI,  but  after  normalization  for 
network  topological  bias  (see  Text). 

Found  at:  doi:  10. 137 1/journal.pcbi.  1000889. s002  (0.09  MB  PDF) 

Figure  S3  Effect  of  data  range  on  NetWalk  output.  Original 
mRNA  expression  changes  in  response  to  1  uM  doxorubicin  (ratio) 
were  log2-transformed  (di),  and  then  transformed  back  by  taking 
exponential  with  different  expansion  factors  f,  a_i  =  F'(d_i )  where 
a_i  is  the  transformed  value  of  gene  i,  di  is  the  log2-transformed 
original  ratio  value  of  gene  i  and  f  is  the  expansion  factor. 
Distributions  of  the  transformed  data  with  different  expansion 
factors  are  shown  in  A.  Numbers  above  each  distribution  chart 
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shows  the  expansion  factor.  Expansion  factor  of  2  corresponds  to 
the  original  distribution.  B)  Correlation  of  visitation  frequencies 
corresponding  to  each  transformed  dataset  with  the  original 
visitation  frequency  values  (i.e.  f=2).  C)  Correlation  of  visitation 
frequency  values  for  each  expansion  faetor  with  the  supplied 
transformed  data  values.  D-E)  Highest  scoring  interactions 
calculated  using  transformed  datasets  with  expansion  factor  D) 
1.25  and  E)  5.  Note  that  the  two  networks  are  highly  similar  ~ 
95%  same  node  composition). 

Found  at:  doi:  10. 137 1/journal.pcbi.  1000889. s003  (0.23  MB  PDF) 

Figure  S4  Effect  of  data  deletions  on  NetWalk  output.  Portions 
of  data  were  deleted  and  node  visitation  frequencies  were 
caleulated  by  NetWalk.  Shown  are  the  correlations  of  each 
deletion  with  the  original  node  visitation  frequency  values  (i.e.  0% 
deletion). 

Found  at:  doi:  10.1 37 1/journal.pcbi.l 000889. s004  (0.03  MB  PDF) 

Figure  S5  Effect  of  network  deletions  on  NetWalk  output.  A 
network  corresponding  to  690  nodes  (highest  scoring  interactions 
in  luM  doxorubicin  dataset)  was  selected  and  nodes  were  deleted 
at  random.  Correlation  of  resulting  node  visitation  frequency 
values  with  the  original  unperturbed  network  of  690  nodes  is 
shown  (blaek).  In  addition,  corresponding  correlations  with  the 
node  degrees  in  eaeh  networks  are  also  shown.  Note  that  although 
total  number  of  interactions  are  relatively  similar  in  each  deletion, 
the  NetWalk  output  changes  substantially  due  to  ehanges  in  the 
local  network  connectivities. 

Found  at:  doi:  10. 137 1/journal.pcbi.  1000889. s005  (0.07  MB  PDF) 

Figure  S6  Highest  scoring  networks  corresponding  to  estrogen 
receptor  positive  MCF7  cells  relative  to  58  other  breast  cancer  cell 
lines.  ESRl  (estrogen  receptor  gene)  is  highlighted. 

Found  at:  doi:  10. 137 1/journal.pcbi.l 000889. s006  (0.26  MB  PDF) 

Figure  S7  Effect  of  edge  perturbations  on  NetWalk  output.  A 
random  network  corresponding  to  755  nodes  was  selected  out  of 
the  whole  network  (3721  interactions).  A)  Edges  were  deleted  at 
random  and  correlation  of  the  resultant  node  visitation  frequencies 
were  compared  to  that  of  unperturbed  network.  B)  To  the  network 
in  A  where  50%  of  all  edges  were  removed,  we  added  random 
interaetions  between  random  pairs  of  nodes  and  compared  the 
resultant  NetWalk  output  with  the  initial  NetWalk  output  at  50% 
deleted  network. 

Found  at:  doi:10. 137 1/journal.pcbi.  1000889. s007  (0.09  MB  JPG) 
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Abstract 

Targeted  therapeutics  hold  tremendous  promise  in  inhibiting  cancer  cell  proliferation.  However,  targeting 
proteins  individually  can  be  compensated  for  by  bypass  mechanisms  and  activation  of  regulatory  loops.  Design¬ 
ing  optimal  therapeutic  combinations  must  therefore  take  into  consideration  the  complex  dynamic  networks  in 
the  cell.  In  this  study,  we  analyzed  the  insulin-like  growth  factor  (IGF-1)  signaling  network  in  the  MDA-MB231 
breast  cancer  cell  line.  We  used  reverse-phase  protein  array  to  measure  the  transient  changes  in  the  phosphor¬ 
ylation  of  proteins  after  IGF-1  stimulation.  We  developed  a  computational  procedure  that  integrated  mass  ac¬ 
tion  modeling  with  particle  swarm  optimization  to  train  the  model  against  the  experimental  data  and  infer  the 
unknown  model  parameters.  The  trained  model  was  used  to  predict  how  targeting  individual  signaling  proteins 
altered  the  rest  of  the  network  and  identify  drug  combinations  that  minimally  increased  phosphorylation  of 
other  proteins  elsewhere  in  the  network.  Experimental  testing  of  the  modeling  predictions  showed  that  optimal 
drug  combinations  inhibited  cell  signaling  and  proliferation,  whereas  nonoptimal  combination  of  inhibitors 
increased  phosphorylation  of  nontargeted  proteins  and  rescued  cells  from  cell  death.  The  integrative  approach 
described  here  is  useful  for  generating  experimental  intervention  strategies  that  could  optimize  drug  combina¬ 
tions  and  discover  novel  pharmacologic  targets  for  cancer  therapy.  Cancer  Res:  70(17):  OFl-11.  ©2010  AACR. 


Major  Findings 

Simple  and  reliable  strategies  are  needed  to  identify 
optimal  combinations  of  molecular  targeted  drugs  to 
treat  individual  cancer  patients,  to  realize  the  fullest 
potential  of  a  targeted  therapeutic  approach. 


Introduction 

Cell  signaling  networks  are  complex  systems  that  integrate 
information  from  the  cellular  environment  (1-5).  Maps  of 
complex  networks  were  derived  by  interconnecting  the  indi¬ 
vidual  pathways  obtained  from  experimental  data  (6,  7). 
These  studies  revealed  that  signaling  networks  contain  nu¬ 
merous  features,  such  as  feedback  and  feedforward  loops 
(8,  9),  which  render  it  virtually  impossible  for  the  human 
mind  to  decipher  how  signals  are  integrated  within  the  path- 
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ways.  Thus,  computational  approaches  are  needed  to  eluci¬ 
date  the  regulatory  properties  of  signaling  networks  (10-12). 

Several  groups  have  used  ordinary  differential  equations 
(ODE)  to  analyze  the  dynamics  of  signaling  networks  and 
generate  experimentally  testable  predictions  (6,  13-17).  The 
use  of  mass  action  ODE  modeling,  however,  is  impaired 
because  of  incomplete  knowledge  about  the  concentrations 
and  kinetics  of  signaling  intermediates. 

Inferring  the  parameters  for  mass  action  modeling  in  sig¬ 
naling  networks  is  challenging.  The  most  common  approach 
is  to  obtain  parameters  from  the  literature  and  fit  the  models 
to  the  experimental  data  to  infer  those  that  remain  unknown 
(6,  13,  18-24).  Unfortunately,  the  kinetic  parameters  reported 
in  the  literature  may  differ  by  orders  of  magnitude,  depend¬ 
ing  on  experimental  conditions.  Thus,  it  is  difficult  to  deter¬ 
mine  whether  discrepancies  between  computational  and 
experimental  data  are  due  to  inaccurate  measures  or  incom¬ 
plete  modeling.  Parameter  estimation  can  be  effectively 
accomplished  using  optimization  methods,  which  enable 
quantitative  model  fitting  to  experimental  data  (25-31). 
However,  the  experimental  techniques  used  to  measure  the 
activity  of  signaling  proteins  mainly  provide  qualitative  or 
semiquantitative  data.  Optimization  strategies  are  thus  need¬ 
ed  to  identify  sets  of  model  parameters  that  equally  fit  the 
qualitative  experimental  data. 

Another  challenge  in  the  analysis  of  signaling  networks  is 
the  identification  of  optimal  target  combinations.  The  most 
common  methods  of  computational  target  identification  are 
based  on  formulating  mathematical  models  and  designing 
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Quick  Guide  To  Equations  And  Assumptions 

Mass  action  modeling 

The  dynamics  of  the  IGFR  network  in  MDA-MB231  cells  were  described  using  a  mass  action  model  of  ODEs  formulated  as  follows: 
Step  1:  The  pathways  comprising  the  IGFR  network  were  reconstructed  into  a  set  of  chemical  reactions  that  described  the 
simplified  mechanisms  of  activation  and  inhibition  of  relevant  proteins.  For  example,  mitogen-activated  protein  kinase  (MAPK) 
phosphorylation  was  assumed  to  be  catalyzed  by  MAPK  kinase  [MAP/  extracellular  signal-regulated  kinase  kinase  (MEK)/MAPKK] 
and  occurred  through  an  enzymatic  reaction: 


MAPK  +  MEK* 


'C- 


■  MAPK*  +  MEK* 


(1) 


In  equation  1,  k/,  k{,  and  K2  are  the  forward,  reverse,  and  dissociation  kinetic  rate  constants,  respectively. 

Step  2:  The  set  of  chemical  reactions  was  transformed  into  a  system  of  coupled  ODEs  by  assuming  that  the  dynamics  of  the 
IGFR  network  obeyed  the  law  of  mass  action.  Specifically,  the  accumulation  rate  of  the  concentration  of  the  fth  signaling  inter¬ 
mediate  was  expressed  as  the  difference  between  its  net  rates  of  production  (rp,,)  and  consumption  Thus,  the  accumulation 
rate  of  the  concentration  of  MEK*  was  expressed  as  follows: 


dMEK* 

dt 


r p,MEIC 


rc,MEK> 


-k([MAPK][MEK*]  +  {k[  +  K2)[C] 


(2) 


In  equation  2,  [MAPK],  [MEK\  and  [C]  denote  the  concentration  of  MAPK,  MEK*  and  C,  respectively. 

The  list  of  chemical  reactions  that  described  the  consensus  activation  and  inhibition  mechanisms  of  proteins  involved  in  the 
IGFR  network  and  the  corresponding  system  of  ODEs  are  listed  in  Supplementary  Material  SI.  To  implement  mass  action  mod¬ 
eling,  it  was  necessary  to  infer  the  unknown  model  parameters,  which  are  the  kinetic  rate  constants  and  the  initial  concentra¬ 
tions  of  the  proteins.  In  this  regard,  we  trained  the  mass  action  model  against  transient  data  measured  by  RPPA  using  PSO.  We 
selected  PSO  because  of  its  superior  ability  to  converge  to  more  optimal  solutions  compared  with  other  optimization  algorithms 
(see  Discussion). 


Particle  swarm  optimization 

PSO  is  a  stochastic  algorithm  that  mimics  the  behavior  of  swarms  of  animals  that  search  for  food  (36).  Particles  in  the  swarm 
have  a  position  Xy,  a  velocity  Vy,  and  a  fitness  fp  in  which  i  and  j  represent  the  number  of  particles  and  the  dimension  of  the  space 
solution,  respectively.  Each  particle  remembers  its  best  position  Xif'  locally  and  the  best  position  xf  globally  reached  by  the  entire 
swarm.  During  the  iterative  search  for  food,  particles  update  their  position  and  velocities  to  improve  their  fitness  according  to  the 
following  rules: 

Uij{t  1)  =  -h  Ciri  [a;^.(t)  -  a:y(t)]  -f  C2r2  -  a;y(t)] 


Xij{t  -I-  1)  =  Xij{t)  -I- 


(3) 


In  equations  3,  u  is  the  inertia  factor;  r^  and  r2  are  two  random  numbers  uniformly  distributed  in  the  interval  [0,1];  and  Ci  and  C2 
are  the  coefficients  of  self-recognition  and  social  component  (see  ref  37  and  Supplementary  Material  S2  for  details  on  parameters 
in  equations  3). 

In  our  settings,  the  particle  positions  represented  the  unknown  parameter  values  used  in  the  mass  action  model  to  generate 
computationally  the  time  courses  of  proteins  that  are  measured  by  RPPA;  the  particle  velocities  denoted  the  extent  to  which  the 
parameter  values  were  iteratively  changed;  and  the  particle  fitness  was  defined  as  the  distance  between  the  time  courses  of  pro¬ 
teins  experimentally  and  computationally  measured.  Model  parameters  were  randomly  initialized  and  iteratively  changed  accord¬ 
ing  to  equation  3  until  the  distance  between  the  time  courses  of  the  measured  and  predicted  proteins  was  minimal  (i.e.,  optimal 
fitness).  The  distance  between  computed  and  measured  time  courses  was  evaluated  using  the  SD-weighted  square  error; 


j=i  i=i 


(4) 


In  equation  4,  y  -  and  a{y  ij")  represent  the  mean  and  SD,  respectively,  of  the  proteins  measured  by  RPPA,  whereas  yP  denotes 
the  protein  levels  computed  using  the  mass  action  model.  Moreover,  s  represents  the  total  number  of  data  points  comprising  a 
single  time  course,  and  r  is  the  total  number  of  time  courses.  PSO  was  implemented  to  minimize  the  SD-weighted  square  error  and 
train  the  mass  action  model  against  RPPA  data  to  estimate  the  unknown  model  parameters. 
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intervention  strategies  through  environmental,  genetic,  and 
signaling  perturbations  (32-34).  This  approach  can  predict 
the  effect  of  available  drugs  on  signaling  network  dynamics, 
but  it  does  not  facilitate  the  search  for  drug  combinations 
that  would  optimally  inhibit  aberrant  signaling.  Another 
strategy  is  to  integrate  mass  action  modeling  with  simulated 
annealing  into  a  multiple-target  optimal  intervention  (35). 
Because  this  approach  is  computationally  expensive,  alterna¬ 
tive  procedures  are  needed  to  enable  the  rapid  search  for  tar¬ 
gets  in  disease-related  networks. 

In  this  study,  we  used  reverse-phase  protein  array 
(RPPA)  to  measure  the  transient  response  of  the  MDA- 
MB231  breast  cancer  cell  line  after  stimulation  by  insu¬ 
lin-like  growth  factor  (IGF-1).  The  reason  for  choosing 
the  IGF  receptor  (IGFR)  network  is  2-fold:  There  is  a  large 
amount  of  experimental  data  and  biological  resources  al¬ 
lowing  us  to  build  a  consensus  network  and  experimentally 
test  it;  components  of  this  network  are  being  targeted  in 
several  clinical  trials  for  cancer  therapy,  thus  having  clini¬ 
cal  applicability.  We  developed  a  computational  procedure 
that  integrated  mass  action  modeling  with  particle  swarm 
optimization  (PSO)  to  train  the  model  against  normalized 
time  courses  of  phosphorylated  proteins  in  MDA-MB231 
cells  and  infer  sets  of  unknown  model  parameters  that 
equally  fit  the  measured  data.  The  trained  mass  action 
model  was  used  to  predict  the  effect  of  a  targeted  pertur¬ 
bation  and  tested  using  experimental  data.  The  trained  and 
tested  mass  action  model  was  then  used  to  identify  the 
most  influential  molecules  responsible  for  aberrant  ceU  sig¬ 
naling  and  determine  the  optimal  combinations  of  inhibi¬ 
tors  and  small  interfering  RNAs  for  inhibiting  abnormal 
signaling  in  MDA-MB231  cells.  Immunoblotting  and  cell  vi¬ 
ability  assay  were  then  used  to  test  and  validate  the  effect 
of  drug  combinations  predicted  by  the  mass  action  model. 
Our  integrative  approach  is  useful  for  generating  experi¬ 
mental  intervention  strategies  that  could  optimize  drug 
combinations  and  discovering  novel  pharmacologic  targets 
for  cancer  therapy. 

Materials  and  Methods 

Cell  culture  and  stimulation 

The  human  MDA-MB231  breast  cancer  cell  line  (K-Ras 
and  B-Raf  mutants)  was  purchased  from  the  America  Type 
Culture  Collection  (ATCC).  The  cell  line  was  validated  by 
STR  DNA  fingerprinting  using  the  AmpFlSTR  Identifiler  kit 
(Applied  Biosystems).  The  STR  profiles  were  compared  with 
known  ATCC  fingerprints  (http://www.atcc.org/)  and  to  the 
Cell  Line  Integrated  Molecular  Authentication  database  ver¬ 
sion  0.1.200808  (38).  Cells  were  cultured  in  RPMI  supplemen¬ 
ted  with  5%  fetal  bovine  serum.  Cells  were  serum  starved 
overnight  and  then  subjected  to  treatment  with  75  ng/mL 
IGF-1  (Cell  Signaling  Technology).  For  RPPA,  cells  were 
pretreated  with  10  pmol/L  U0126  (Promega)  for  4  hours, 
followed  by  IGF-1  stimulation  for  5,  15,  30,  60,  90,  or  120  min¬ 
utes.  For  immunoblotting,  cells  were  pretreated  with 
10  pmol/L  U0126,  50  pmol/L  LY294002  (Calbiochem- 
Nova-Biochem  Corp.)  and  50  nmol/L  rapamycin  (Calbiochem- 


Nova-Biochem  Corp.),  individually  or  combined,  for  1  hour, 
followed  by  IGF-1  stimulation  for  5  or  60  minutes.  For  RPPA 
and  immunoblotting,  controls  were  incubated  for  the 
corresponding  times  with  DMSO. 

Antibodies 

The  following  antibodies  were  used  for  RPPA  and  immu¬ 
noblotting:  anti-phospho-MAPK  (T202/Y204),  anti-phos- 
pho-GSK3  (S21/S9),  anti-phospho-AKT  (ser473),  anti- 
phospho-TSC2  (T1462),  anti-phospho-mammalian  target 
of  rapamycin  (mTOR;  S2448),  anti-phospho-P70S6K 
(T389),  anti-MAPK  (p44/42),  anti-AKT,  anti-TSC2  (28A7), 
anti-mTOR,  anti-P70S6K,  and  anti-actin  were  from  CeU  Sig¬ 
naling  Technology;  and  anti-GSK3  was  from  Santa  Cruz 
Biotechnology,  Inc. 

Immunoblotting 

Immunoblotting  was  performed  using  standard  procedures. 

Reverse-phase  protein  array 

Serial  dUuted  lysates  were  arrayed  on  nitrocellulose-coated 
FAST  slides  (Whatman)  using  the  Aushon  2470  Arrayer 
(Aushon  Biosystems).  Each  slide  was  probed  with  a  primary 
antibody  plus  a  biotin-conjugated  secondary  antibody.  The 
signal  was  amplified  using  the  DakoCytomation-catalyzed 
system  (DAKO)  and  visuahzed  using  a  3,3'-diaminobenzidine 
colorimetric  reaction.  The  slides  were  scanned,  analyzed,  and 
quantified  using  the  customized  Microvigene  software  (Vig- 
eneTech,  Inc.)  to  measure  spot  intensity.  Each  dilution  curve 
was  fitted  vdth  the  logistic  model  “Supercurve  Fitting”  (39). 
The  mean  values  of  the  protein  levels  in  the  nonstimulated 
ceUs  were  used  to  normahze  the  time  courses  of  the  phos¬ 
phorylated  proteins  measured  in  IGF-1 -stimulated  cells. 

Crystal  violet  ceU  viability  assay 

Viability  assay  was  performed  using  standard  procedures. 
CeUs  were  treated  for  3  days  with  the  foUowing:  U0126  (concen¬ 
tration  of  0.1-100  pmol/L),  LY294002  (concentration  of  0.1- 100 
pmol/L),  or  rapamycin  (concentration  of  0.1-100  nmol/L); 
combination  of  U0126  (concentration  of  0.5-50  pmol/L)  and 
LY294002  [fixed  at  its  quarter  maximal  effective  concentration 
(EC2s)  value  of  3.8  pmol/L]  or  rapamycin  (fixed  at  its  EC2S  val¬ 
ue  of  0.1  nmol/L);  and  combination  of  U0126  (fixed  at  its  EC25 
value  of  3.5  pmol/L)  and  rapamycin  (concentration  of  0.5  to  50 
nmol/L).  Corresponding  controls  were  incubated  with  DMSO. 
The  EC25  of  each  inhibitor  was  estimated  (Supplementary  Ma¬ 
terial  S3)  using  Microsoft  GraphPad  Prism. 

Computational  procedures 

Computational  procedures  are  described  in  Supplemen¬ 
tary  Material  S2. 

Results 

IGFR-l  signaling  detection  by  RPPA 

Figure  lA  shows  the  IGFR  signaling  network  in  the  MDA- 
MB231  ceU  line.  Signal  transduction  is  originated  when  IGF-1 
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Figure  1.  A,  IGFR  signaling  network  topology  in  the  MDA-MB231  cell  line.  Nodes,  proteins;  edges,  protein  interactions;  red  arrows,  protein  activation; 
and  green  plungers,  protein  inactivation.  B,  protein  profiles  were  measured  on  RPPA  in  triplicate.  The  mean  protein  profiles  of  non-IGF-1 -stimulated 
cells  were  used  as  controls  for  normalization.  Circles,  mean  of  the  normalized  protein  profiles;  bars,  SD.  Normalized  time  courses  were  computationally 
evaluated  using  the  trained  mass  action  model.  Solid  red  lines,  the  mean  time  courses  of  the  trajectories  that  equally  fit  the  experimental  data;  dashed 
black  lines,  the  fitting  variability.  C,  histogram  of  model  parameter  regimens  clustered  according  to  the  coefficient  of  variation  (CV  =  SD/mean). 
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complexes  with  IGFR  and  triggers  IGFR  autophosphorylation 
(40).  Phosphorylated  IGFR  propagates  the  signal  downstream 
through  the  MARK  and  phosphoinositide-3-kinase  (PI3K) 
pathways,  and  leads  to  MAPK  and  protein  kinase  B  (PKB/ 
AKT)  phosphorylation  (4,  5).  The  signals  from  the  MAPK 
and  PI3K  cascades  are  routed  to  the  mTOR  pathway  through 
tuberous  sclerosis  (TSC2)  inactivation  (1).  Phosphorylated 
mTOR  activates  protein  S6  kinase  of  70  kDa  (p70S6K),  which 
inactivates  the  insulin  receptor  substrate  (IRS-f)  through 
a  negative  feedback  loop  (41).  A  detailed  description  of  the 
network  topology  is  provided  in  Supplementary  Material 
SI.  We  used  RPPA  (42-46)  to  determine  the  changes  in  the 
phosphorylation  of  proteins  in  the  IGFR  network  after  IGF-1 
stimulation.  To  account  for  the  intrinsic  variability  of  these 
assays,  all  experiments  were  performed  in  three  independent 


repeats.  Figure  IB  shows  the  time  courses  of  the  measured 
phosphorylated  proteins;  the  curves  show  the  protein  fold 
change  over  the  corresponding  controls  (Materials  and 
Methods).  After  IGF-1  stimulation,  the  level  of  phospho- 
AKT  peaked  at  30  minutes  (28-fold  increase)  and  then  settled 
toward  a  lower  level  at  120  minutes  (18-fold  increase). 
In  contrast,  signal  transduction  across  the  MAPK  cascade 
remained  essentially  unchanged  likely  as  a  result  of  MAPK 
constitutive  activation  driven  by  K-Ras  and  B-Raf  muta¬ 
tions  in  MDA-MB231  cells.  AKT  activation  triggered  glyco¬ 
gen  synthase  kinase  (GSK3)  and  TSC2  downregulation 
through  phosphorylation,  and  TSC2  inactivation  facilitated 
phospho-mTOR  and  phospho-p70S6K  upregulation.  Thus, 
the  levels  of  p-GSK3,  p-TSC2,  p-mTOR,  and  p-p70S6K  ini¬ 
tially  increased  and  then  adjusted  to  stationary  levels. 


Figure  2.  The  effect  of  MEK  inhibition 
on  IGFR  network  dynamics.  A,  protein 
profiles  of  IGF-1 -stimulated  MDA-MB231 
cells  predicted  by  the  trained  mass  action 
model.  Solid  red  lines,  the  protein  time 
courses  of  the  noninhibited  cells;  solid 
green  lines,  the  protein  profiles  of 
MEK-inhibited  cells.  B,  protein 
phosphorylation  in  MDA-MB231  cell 
lysates  after  stimulation  with  75  ng/mL 
IGE-1  detected  in  triplicate  by  RPPA. 
Solid  circles  and  red  lines,  the  protein 
time  courses  of  noninhibited  cells;  solid 
circles  and  green  lines,  the  protein 
profiles  of  cells  inhibited  with  MEK 
inhibitor  for  4  h. 
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Complementing  mass  action  modeling  with  PSO 

Mass  action  modeling  and  model  reduction.  To  predict 
the  dynamics  of  the  IGFR  network  after  IGF-1  stimulation  in 
MDA-MB231  cells,  we  developed  a  mass  action  ODE  model. 
Our  formulation  was  based  on  a  set  of  77  chemical  reactions 
that  described  the  consensus  activation  and  inhibition 
mechanisms  of  proteins  involved  in  the  IGFR  network.  The 
resulting  mass  action  model  was  structured  into  127  ODEs 
and  313  unknown  parameters.  To  decrease  the  complexity 
of  the  model,  we  developed  a  reduced  version  of  the  original 
model.  The  77  chemical  reactions  were  reduced  to  a  subset 
of  41  reactions  to  describe  the  simplified  interaction 
mechanisms  of  the  most  relevant  species  in  the  IGFR  net¬ 
work,  and  the  original  model  was  reduced  to  65  ODEs  and 
161  model  unknowns  (Supplementary  Material  S4).  We 
tested  and  validated  the  ability  of  the  reduced  model  to 
adequately  describe  IGER  dynamics  by  showing  that  the  pro¬ 
tein  profiles  predicted  by  the  reduced  model  matched  those 
generated  by  the  original  model  for  randomly  selected  sets  of 
parameters  (Supplementary  Material  S5).  Therefore,  through¬ 
out  the  article,  we  exclusively  used  the  reduced  model  to  pre¬ 
dict  the  dynamics  of  IGER  signaling  network. 

Model  training.  The  measured  time  course  data  of  pro¬ 
teins  in  MDA-MB231  cells  contain  relevant  information 
about  the  regulatory  loops  comprising  the  IGFR  network. 
To  exploit  this  information  to  optimally  inhibit  aberrant 
pathways,  we  used  PSO  to  fit  the  model  to  the  time  courses 
of  p-AKT,  p-MAPK,  P-GSK3,  p-mTOR,  p-p70S6K,  and  p-TSC2 
proteins  and  infer  the  161  unknown  parameters.  Studies 
published  in  the  literature  typically  use  only  two  or  three 
“readout”  molecules  to  fit  ODE  models  to  experimental  data 
and  infer  unknown  parameters  (14,  15,  26).  In  our  study,  we 
trained  our  model  using  six  readout  proteins  and  126  expe¬ 
rimental  data  points  combined  into  a  scalar  fitness. 

Because  of  the  substantial  degree  of  uncertainty  in  param¬ 
eter  estimation,  fitting  mass  action  models  to  the  qualitative 
data  measured  on  RPPA  required  the  identification  of  multi¬ 
ple  trajectories  that  equally  resembled  the  measured  protein 
profiles.  Using  the  integrative  mass  action  modeling  PSO 
procedure,  we  identified  10  sets  of  model  parameters  that 
equally  fit  the  measured  data  (Supplementary  File  SI).  We 
characterized  the  parameter  regimens  by  ranking  the  para¬ 
meters  according  to  their  coefficient  of  variation  (CV)  and 
found  that  69%  of  them  had  a  CV  smaller  than  1  (Fig.  1C). 
We  calculated  the  means  and  SD  of  the  identified  trajectories 
to  represent  the  entire  set  and  the  fitting  variability.  Figure  IB 
shows  the  mean  trajectories  and  the  fitting  variability 
identified  by  the  mass  action  model,  which  had  been  trained 
using  PSO  against  normahzed  protein  profiles  measured  on 
RPPA  after  IGF-1  stimulation  of  MDA-MB231  cells.  The 
simulation  results  indicated  that  the  integrative  procedure 
adequately  fit  the  time  courses  of  aU  measured  proteins. 

Model  testing.  To  determine  the  ability  of  the  trained 
model  to  correctly  generate  responses  to  perturbations  that 
have  not  been  explicitly  included  in  the  training  data  set,  we 
used  it  to  predict  the  dynamics  of  the  IGFR  network  after 
inhibition  of  MEK.  Figure  2A  shows  the  transient  IGFR  sig¬ 
naling  response  to  targeted  MEK  inhibition,  as  predicted  by 


the  trained  mass  action  model.  MEK  inhibition  led  to  signif¬ 
icant  dovmregulation  of  its  immediate  dovmstream  effector, 
p-MAPK.  Inhibition  of  p-MAPK  attenuated  the  inhibition  of 
IRS-1  through  direct  interaction  and  through  the  p70S6K 
feedback  loop.  Consequently,  p-AKT  was  upregulated.  Acti¬ 
vation  of  p-AKT  increased  the  level  of  p-TSC2  but  did  not 
affect  the  level  of  p-mTOR  or  GSK3.  Signals  from  the  MARK 
and  mTOR  cascades  were  integrated  into  the  p70S6K  path¬ 
way  and  led  to  p-p70S6K  downregulation. 

The  computational  results  were  experimentally  tested 
using  an  independent  set  of  252  data  points  measured  by 
RPPA.  Figure  2B  shows  the  levels  of  p-AKT,  p-MAPK, 
p-GSK3,  p-mTOR,  p-p70S6K,  and  p-TSC2  detected  in  tripli¬ 
cate  in  IGF-l-stimulated  MDA-MB231  cells  in  the  absence 
of  the  MEK  inhibitor  and  after  4  hours  of  incubation  with 
the  MEK  inhibitor.  The  experimental  data  indicated  that 
MEK  inhibition  increased  p-AKT  and  p-TSC2  levels,  de¬ 
creased  p-MAPK  and  p-p70S6K  levels,  and  slightly  decreased 
p-GSK3  levels  but  had  no  significant  effect  on  p-mTOR  levels. 
Despite  the  limited  discrepancy  between  the  computed  and 
measured  profiles  of  p-GSK3,  the  experimental  results  ade¬ 
quately  matched  those  predicted  by  the  model.  Therefore, 
the  trained  mass  action  model  correctly  predicted  the  effect 
of  MEK  inhibition  on  IGFR  dynamics. 

Predicting  inhibition  of  targeted  molecnles 

Individual  inhibition  of  targeted  moleeules.  To  deter¬ 
mine  how  to  select  drugs  with  the  ability  to  inhibit  the 
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Figure  3.  The  effect  of  single-molecule  inhibition  on  IGFR  signaling  in 
MDA-MB231  cells.  Differential  levels  of  proteins  in  single  molecule- 
inhibited  versus  noninhibited  MDA-MB231  cells  after  2  h  of  stimulation 
with  75  ng/mL  IGF-1  were  predicted  using  the  trained  mass  action  model. 
Numerical  values  were  converted  to  logio-  Blue,  inhibition;  white,  no 
variation;  red,  activation. 
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Figure  4.  Combined  inhibition  of 
MEK  and  PI3K  in  MDA-IVIB231 
cells.  A  and  B,  protein  levels  were 
detected  in  unstimulated  cells  in 
the  absence  of  inhibition  (columns 
1)  and  after  1  h  of  incubation  with 
the  MEK  and/or  PI3K  inhibitors 
(columns  2-4),  and  in  cells 
stimulated  with  IGF-1  in  the 
absence  of  inhibition  (column  5) 
and  after  1  h  of  incubation  with 
the  MEK  and/or  PI3K  inhibitors 
(columns  6-8).  Cells  were 
stimulated  with  75  ng/mL  IGF-1 
for  5  min  (A)  or  60  min  (B).  C  and 
D,  density  of  the  bands  after 
normalization  with  respect  to  actin. 
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pathways  measured  in  MDA-MB231  cells,  we  used  the 
trained  and  tested  mass  action  model  to  predict  the  re¬ 
sponse  of  the  IGFR  network  after  molecules  in  the  net¬ 
work  had  been  individually  inhibited  (Supplementary 
Material  S6).  Figure  3  and  Supplementary  Material  S6  show 
the  differential  levels  of  proteins  in  inhibited  versus  nonin- 
hibited  MDA-MB231  cells  after  IGF-1  stimulation,  as  pre¬ 
dicted  by  the  mass  action  model  for  3  of  the  10  sets  of 
parameters  inferred  using  PSO  (Supplementary  File  S2). 
These  three  sets  were  randomly  selected  to  repeat  the 
computational  analysis  in  triplicate.  The  modeling  results 
suggested  that  targeting  one  molecule  at  a  time  may  acti¬ 
vate  nontargeted  molecules,  likely  as  a  result  of  feedback 
loop  compensation.  Thus,  targeting  a  single  molecule  may 
not  be  sufficient  to  adequately  inhibit  aberrant  signaling. 
Although  inhibition  of  molecules  in  the  MAPK  pathway 


was  predicted  to  activate  the  PI3K/AKT  pathway,  inhibi¬ 
tion  of  intermediates  comprising  the  PI3K/AKT  pathway 
was  predicted  to  activate  the  MAPK  pathway.  Because 
these  pathways  are  often  upregulated  in  many  tumors 
(47),  the  combined  inhibition  of  the  MAPK  and  PI3K/ 
AKT  cascades  emerged  as  a  candidate  strategy  to  inhibit 
aberrant  signaling  in  MDA-MB231  cells. 

Combined  inhibition  of  targeted  molecules.  Predicting 
the  response  of  IGFR  networks  to  the  inhibition  of  individual 
molecules  may  not  necessarily  identify  optimal  drug  combi¬ 
nations  for  pharmacologic  intervention.  In  contrast,  perturb¬ 
ing  all  molecules  in  the  network  simultaneously  would 
identify  optimal  combinations  needed  to  inhibit  aberrant 
signaling.  From  a  computational  point  of  view,  the  effect  of 
small  interfering  RNAs  can  be  mimicked  by  varying  the  initial 
concentration  of  signaling  proteins.  The  effect  of  the  drug 
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inhibitor  can  be  approximately  simulated  by  varying  the 
values  of  rescaled  kinetic  rate  constants,  such  as  in  the  clas¬ 
sic  example  of  competitive  inhibition  (48).  Integrating  mass 
action  modeling  with  a  random  sampling  of  kinetic  con¬ 
stants  (inhibitors)  and  initial  protein  concentrations  (small 
interfering  RNAs)  within  predefined  intervals  of  values  would 
thus  provide  an  unbiased,  unsupervised  means  of  computa¬ 
tionally  predicting  the  effect  of  simultaneously  perturbing  all 
molecules  in  the  IGFR  network  (computational  procedures). 

Combinations  of  signaling  targets  were  identified  by  com¬ 
paring  the  model  parameters  inferred  using  PSO  from  data 
measured  in  MDA-MB231  ceUs  with  randomly  sampled  mod¬ 
el  parameters  that  could  restore  user-defined  signaling  out¬ 
put.  We  defined  the  signaling  network  characterized  by  the 


measured  time  courses  of  p-AKT,  p-MAPK,  and  p-p70S6K  as 
aberrant,  as  these  proteins  are  often  upregulated  in  many 
tumors  (47).  We  defined  the  state  at  which  p-AKT,  p-MAPK, 
and  p-p70S6K  levels  were  inhibited  by  at  least  5-fold  as 
user-defined  signaling  output.  To  obtain  reliable  results,  we 
identified  200  collections  of  combined  perturbations  (Supple¬ 
mentary  File  S3)  that  restored  the  user-defined  signaling 
changes  in  MDA-MB231  cells  for  the  sets  of  parameters  listed 
in  Supplementary  File  S2. 

The  most  influential  targets  were  scored  according  to 
the  absolute  value  of  the  median  deviation  to  the  SD  ratio 
(Supplementary  Table  SI).  The  computational  analysis 
was  repeated  in  triplicate  using  the  same  three  randomly 
selected  sets  of  parameters  used  to  predict  individual 
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Figure  5.  Combined  inhibition  of 
MEK  and  mTOR  in  MDA-MB231 
cells.  A  and  B,  protein  levels  were 
detected  in  unstimulated  cells  in 
the  absence  of  inhibition  (column  1) 
and  after  1  h  of  incubation  with 
the  MEK  and/or  mTOR  inhibitors 
(columns  2-4),  and  in  cells 
stimulated  with  IGF-1  in  the 
absence  of  inhibition  (column  5) 
and  after  1  h  of  incubation  with 
the  MEK  and/or  mTOR  inhibitors 
(columns  6-8).  Cells  were 
stimulated  with  75  ng/mL  IGF-1  for 
5  min  (A)  or  60  min  (B).  C  and  D, 
density  of  the  bands  after 
normalization  with  respect  to  actin. 
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inhibition  of  targeted  molecules.  Despite  being  ranked  in  a 
different  order,  the  top  five  targets  were  the  same  for  the 
three  sets.  All  targets  that  scored  as  influential  were  char¬ 
acterized  by  a  positive  median  deviation,  which  indicated 
activation  of  the  reactions  leading  to  inhibition  of  phos- 
phorylated  protein.  Because  p-IGFR,  p-IRS-1,  p-MEK,  p- 
MAPK,  and  p-AKT  were  scored  as  the  most  influential  tar¬ 
gets,  combined  inhibition  of  the  PI3K/AKT  and  MAPK 
pathways  was  predicted  to  optimally  facilitate  disruption 
of  the  loops  responsible  for  aberrant  signaling  in  MDA- 
MB231  cells. 

Experimental  validation  of  modeling  predictions 

The  effect  of  drug  combinations  on  the  IGFR  network  in 
the  MDA-MB231  cell  line.  We  used  immunoblotting  to  de¬ 
termine  whether  the  combined  inhibition  of  the  MAPK  and 
PI3K/AKT  pathways  would  decrease  the  levels  of  p-AKT, 
p-MAPK,  and  p-p70S6K,  and  minimize  changes  in  phosphory¬ 
lation  of  other  signaling  proteins  in  the  network.  We  also 
tested  the  combination  of  MEK  and  mTOR  inhibitors  to 
determine  whether  targeting  pathways  that  differ  from  the 
predicted  optimal  combination  would  restore  user-defined 
signaling  changes  in  the  MDA-MB231  cells. 

Figures  4A  and  B  show  the  levels  of  p-AKT,  p-MAPK, 
p-GSK3,  p-mTOR,  p-p70S6K,  and  p-TSC2  detected  in  the  absence 
of  inhibition,  and  after  1  hour  of  incubation  with  the  MEK 
and/ or  PI3K  inhibitors  in  unstimulated  cells  and  IGF-stimulated 
cells.  The  experimental  data  indicated  that,  in  IGF-stimulated 
cells  inhibited  with  MEK  and  PI3K  inhibitors  (column  8), 
the  levels  of  all  phosphorylated  proteins  were  significantly  de¬ 
creased  compared  with  those  of  the  corresponding  proteins 
detected  in  noninhibited,  IGE-stimulated  cells  (column  5). 

Figure  5A  and  B  show  the  levels  of  p-AKT,  p-MAPK, 
p-GSK3,  p-mTOR,  p-p70S6K,  and  p-TSC2  measured  in  the 
absence  of  inhibition,  and  after  1  hour  of  incubation  with 
the  MEK  and/or  mTOR  inhibitors  in  unstimulated  cells  and 
IGF-stimulated  cells.  The  experimental  data  indicated  that 
p-MAPK,  p-mTOR,  and  p-p70S6K  levels  in  IGF-stimulated  cells 
inhibited  with  MEK  and  mTOR  inhibitors  (column  8)  were 
decreased  compared  with  those  of  the  corresponding  proteins 
in  noninhibited,  IGF-l-stimulated  cells  (column  5).  However, 
the  p-AKT,  p-GSK3,  and  p-TSC2  levels  were  increased. 

Supplementary  Table  S2  shows  the  qualitative  comparison 
between  the  measured  and  predicted  differential  levels 
of  phosphorylated  proteins  in  IGF-stimulated  cells  in  the  ab¬ 
sence  of  inhibition,  and  after  1  hour  of  incubation  with  MEK 
and  PI3K  inhibitors  or  MEK  and  mTOR  inhibitors.  Note  that 
the  experimental  results  agree  with  the  modehng  predictions 
for  both  drug  combinations.  Therefore,  as  predicted  by  the 
mass  action  model,  combined  inhibition  of  the  MAPK  and 
PI3K/AKT  pathways  optimally  inhibited  aberrant  networks, 
but  combinations  of  MEK  and  mTOR  inhibitors  did  not 
decrease  the  levels  of  p-AKT,  p-MAPK,  and  p-p70S6K,  and 
increased  phosphorylation  of  nontargeted  protein. 

Cell  sensitivity  to  drug  combinations.  Optimal  inhibition 
of  abnormal  signaling  networks  must  inhibit  regulatory  loops 
and  redundant  bypass  to  ultimately  overcome  the  mecha¬ 
nism  of  feedback  compensation  that  ensures  cancer  cell 
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Figure  6.  Response  curve  of  MDA-MB231  cells  to  dose  concentration 
of  drug  inhibitors.  A,  cells  were  left  nontreated  as  a  control  and 
incubated  with  LY294002  at  its  EC25  (3.8  pmol/L)  or  rapamycin  at  its  EC25 
(0.1  nmol/L)  in  combination  with  U0126  at  a  concentration  of  0.5  to 
50  pmol/L.  B,  cells  were  left  nontreated  as  a  control  and  incubated 
only  with  rapamycin  at  concentrations  of  0.1  to  100  nmol/L  or  with  a 
combination  of  U0126  at  its  EC25  (3.5  pmol/L)  and  rapamycin  at  a 
concentration  of  0.5  to  50  pmol/L.  Absorbance  was  normalized  with 
respect  to  the  value  detected  for  the  controls  and  was  expressed  as  a 
percentage.  Solid  circles,  mean  of  normalized  absorbance;  bars,  SD. 


viability.  To  quantify  the  sensitivity  of  the  MDA-MB231  cell 
line  to  MEK  and  PI3K,  or  MEK  and  mTOR  inhibition,  we  used 
cell  viability  assays. 

Figure  6A  shows  the  effect  of  drug  combinations  on  the 
viability  of  MDA-MB231  cells,  which  was  measured  as  the 
normalized  absorbance  of  viable  cells  as  a  function  of  in¬ 
creasing  MEK  inhibition  (U0126)  in  combination  with  PI3K 
inhibitor  (LY294002)  or  mTOR  inhibitor  (rapamycin).  The 
experimental  results  indicated  that  inhibiting  cells  with  a 
combination  of  LY294002  and  increasing  concentrations  of 
U0126  resulted  in  a  dose-dependent  decrease  in  cell  viability. 
In  contrast,  cells  treated  with  a  combination  of  rapamycin 
and  increasing  concentrations  of  U0126  showed  no  change 
in  cell  proliferation  up  to  1  pmol/L  of  U0126  followed  by  par¬ 
tial  rescue  of  cells  with  rapamycin.  Combined  MEK-PI3K 
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inhibition  monotonically  decreased  cell  viability,  likely  as  a 
result  of  the  optimal  inhibition  of  the  signaling  pathways  that 
led  to  inactivation  of  phosphorylated  proteins  (Fig.  4).  In  con¬ 
trast,  combined  MEK-mTOR  inhibition  increased  cell  viabil¬ 
ity  at  low  concentrations  of  the  U0126  inhibitor,  likely  as  a 
result  of  the  nonoptimal  inhibition  of  the  signaling  network 
that  led  to  activation  of  p-AKT  (Fig.  5).  At  high  concentra¬ 
tions  of  the  U0126  inhibitor,  cell  viability  was  significantly 
decreased  for  both  drug  combinations,  likely  as  a  result  of 
U0126  inhibitor  toxicity  (Supplementary  Material  S3). 

To  test  whether  the  combination  of  MEK  and  mTOR 
inhibitors  rescued  cell  proliferation  by  activation  of  p-AKT, 
we  performed  cell  viability  assays  with  rapamycin  alone  or 
with  rapamycin  in  combination  with  U2016.  The  experi¬ 
mental  results  shown  in  Fig.  6B  indicated  that  addition  of 
a  MEK  inhibitor  to  cells  treated  with  rapamycin  increased 
cell  viability  from  40%  to  73%  and  rescued  cells  from  cell 
death.  Therefore,  the  experimental  results  suggested  that 
optimal  inhibition  of  aberrant  signaling  through  combined 
inhibition  of  the  MARK  and  P13K  pathways  was  correlated 
with  decreased  cell  viabihty.  In  contrast,  nonoptimal  com¬ 
bined  targeted  inhibition  led  to  inadequate  inhibition  of 
the  signaling  networks  and  increased  cell  viability. 

Discussion 

Integrating  mass  action  modeling  with  optimization 
schemes  is  a  quantitative  approach  to  train  ODE  models  us¬ 
ing  experimental  data  and  identify  optimal  drug  combina¬ 
tions  that  can  inhibit  signaling  networks.  PSO  converged 
to  more  optimal  solutions  than  did  other  optimization  algo¬ 
rithms,  including  simulated  annealing  and  genetic  algorithms. 
Supplementary  Table  S3  summarizes  the  performance  of  the 
three  algorithms  in  training  the  reduced  mass  action  model 
against  time  courses  of  proteins  (Supplementary  FUe  S4).  Each 
simulation  was  repeated  three  times  with  different  random 
seeds  of  the  reduced  model  unknown  parameters. 

The  most  simple  and  intuitive  strategy  to  inhibit  aberrant 
networks  consists  of  inhibiting  the  input  sources  that  trigger 
signal  transduction.  Thus,  individual  inhibition  of  IGFR  could 
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